Abstract:
The purpose of speech enhancement is to separate clean speech signal from speech mixed with additional noise, improve speech quality and speech intelligibility. In recent years, supervised deep learning neural networks have been a popular method of speech enhancement. Convolutional recurrent neural network is a novel network structure including encoder, middle layer and decoder. Time-frequency attention mechanism is a simple network module composed of several convolutional layers with skip connections. In training process, it can compute the non-local correlations of speech magnitude. In this paper, we applied T-F attention module into a convolutional neural network. The experimental results show in different signal-to-noise ratio conditions, the proposed method can further improve speech quality and intelligibility, and maintain more speech harmonic information, and achieve lower speech distortion.