结合深度卷积循环网络和时频注意力机制的单通道语音增强算法

Single-channel Speech Enhancement Algorithm Combining Deep Convolutional Recurrent Neural Network And Time-frequency Attention Mechanism

  • 摘要: 语音增强的目的是从带有噪声的语音中分离出纯净语音,实现语音的质量和可懂度的提高。近年来,采用有监督学习的深度神经网络已经成为了语音增强的主流方法。卷积循环网络是一种新型的神经网络结构,包含编码层、中间层、解码层三个主要模块,其已经在语音增强任务中取得了较好的效果。时频注意力机制是一个由数个相连的卷积层通过跳跃连接构成的简单网络模块,在训练过程中可以计算语音幅度谱特征图的非邻域相关性,从而更加有利于网络关注到语音的谐波特性。本文将时频注意力机制引入卷积循环网络的编码层和解码层中,实验结果表明,在不同信噪比条件下,该方法相比基线卷积循环网络能够进一步提高语音质量和可懂度,且增强后的语音信号可以保留更多的语谱谐波信息,实现更低程度的语音失真。

     

    Abstract: The purpose of speech enhancement is to separate clean speech signal from speech mixed with additional noise, improve speech quality and speech intelligibility. In recent years, supervised deep learning neural networks have been a popular method of speech enhancement. Convolutional recurrent neural network is a novel network structure including encoder, middle layer and decoder. Time-frequency attention mechanism is a simple network module composed of several convolutional layers with skip connections. In training process, it can compute the non-local correlations of speech magnitude. In this paper, we applied T-F attention module into a convolutional neural network. The experimental results show in different signal-to-noise ratio conditions, the proposed method can further improve speech quality and intelligibility, and maintain more speech harmonic information, and achieve lower speech distortion.

     

/

返回文章
返回