基于双路径循环神经网络的单通道语音增强

Single Channel Speech Enhancement Based on Dual-path Recurrent Neural Network

  • 摘要: 近年来,随着神经网络的应用,语音增强效果显著提升。但对关联性较强的长序列语音数据,单一的网络结构受到自身性能的限制可能无法继续提升增强效果。为了进一步提升神经网络对语音增强的效果,本文将一种被称为双路径循环神经网络(dual-path recurrent neural network,DPRNN)的复合网络结构应用在语音增强任务中。该复合网络结构由卷积神经网络(convolution neural network,CNN)和长短时记忆神经网络(Long short-term memory,LSTM)组成,网络的核心是两个LSTM组成的双路径循环神经网络块(DPRNN Block)。DPRNN将长序列语音数据分割为重叠帧数据块,利用DPRNN Block对这些数据块执行块内计算和块间计算,以此实现数据的局部和全局建模。实验结果表明,相比于单一网络结构,DPRNN在训练噪声和非训练噪声条件下均取得最好结果。

     

    Abstract: In recent years, speech enhancement has improved significantly with the application of neural networks. However, for long-sequence speech with strong relevance, single network structure may not be able to continue to improve the enhancement effect due to its own performance limitations. To further improve the effect of neural networks on speech enhancement, this paper applied a composite network structure called dual-path recurrent neural network (DPRNN) to speech enhancement tasks. The composite network structure consists of convolutional neural network (CNN) and long short-term memory (LSTM), the core is a dual-path recurrent neural network block (DPRNN Block) composed of two LSTMs. DPRNN splits the long sequence of speech data into overlapping frames data chunks and performs intra- and inter-chunk calculations on these chunks using DPRNN Blocks to achieve local and global data modeling. The experimental result shows, compared with single network structure, DPRNN achieves the best results in both trained noise and untrained noise conditions.

     

/

返回文章
返回