Abstract:
In recent years, speech enhancement has improved significantly with the application of neural networks. However, for long-sequence speech with strong relevance, single network structure may not be able to continue to improve the enhancement effect due to its own performance limitations. To further improve the effect of neural networks on speech enhancement, this paper applied a composite network structure called dual-path recurrent neural network (DPRNN) to speech enhancement tasks. The composite network structure consists of convolutional neural network (CNN) and long short-term memory (LSTM), the core is a dual-path recurrent neural network block (DPRNN Block) composed of two LSTMs. DPRNN splits the long sequence of speech data into overlapping frames data chunks and performs intra- and inter-chunk calculations on these chunks using DPRNN Blocks to achieve local and global data modeling. The experimental result shows, compared with single network structure, DPRNN achieves the best results in both trained noise and untrained noise conditions.