利用深度全卷积编解码网络的单通道语音增强

Single Channel Speech Enhancement Based on Deep Fully Convolutional Encoder-Decoder Neural Network

  • 摘要: 针对传统的神经网络未能对时频域的相关性充分利用的问题,提出了一种利用深度全卷积编解码神经网络的单通道语音增强方法。在编码端,通过卷积层的卷积操作对带噪语音的时频表示逐级提取特征,在得到目标语音高级特征表示的同时逐层抑制背景噪声。解码端和编码端在结构上对称,在解码端,对编码端获得的高级特征表示进行反卷积、上采样操作,逐层恢复目标语音。跳跃连接可以很好地解决极深网络中训练时存在的梯度弥散问题,本文在编解码端的对应层之间引入跳跃连接,将编码端特征图信息传递到对应的解码端,有利于更好地恢复目标语音的细节特征。 对特征融合和特征拼接两种跳跃连接方式、基于L1和 L2两种训练损失函数对语音增强性能的影响进行了研究,通过实验验证所提方法的有效性。

     

    Abstract: Considering the time frequency correlation characteristics of speech is not well utilized in the conventional deep neural network, a single channel speech enhancement method based on deep encoder-decoder neural network is proposed. At the coding end,the time-frequency representation of noisy speech is extracted step by step through convolution and pooling operations of convolution layer to obtain high level feature representation of the target speech. At the same time,the background noise is suppressed. The decoder and the encoder are symmetrical in structure, and the target speech features are reconstructed from the advanced feature representation obtained in the encoder step through de-convolution and up-sampling operations at decoding end. Skip connections are employed to solve the gradient dispersion problem in very deep neural networks. In this paper,low level feature maps which include the detail information of speech are delivered by skip connections from the coding end to the corresponding decoding end feature map in the decoding end. This will help the decoder recover the detailed features of the target speech better. The network is trained in two ways with L1 loss and L2 loss, the performance of two forms of connections, feature fusion and feature concatenation are evaluated in the experiments. The results demonstrate the effectiveness of proposed method.

     

/

返回文章
返回