Abstract:
In order to solve the problem that it is difficult for Convolution Encoder-and-Decoder (CED) network to capture temporal related contexts of speech, a speech enhancement method based on gated residuals convolution encoder-and-decoder network is proposed. Based on CED, this proposed method introduces the gating mechanism, dilated convolution and residual connection to the network: The gating mechanism can well handle the relevant contexts of sequence; Dilated convolution makes the convolution process obtain larger receptive field and extract more abundant global information; Residual connection can prevent vanishing gradient and exploding gradient and improve network accuracy. In addition, the combined optimization strategy of frequency-domain loss function and time-domain evaluation index is adopted to train the network to further improve the enhancement effect of propose network. Experimental results show that, compared with the baseline CED and other comparison methods, the proposed method achieves higher PESQ, STOI and SI-SDR under matched noise and mismatched noise, and it has a good recovery effect on the voiceless and voiced sounds of speech and has strong generalization ability.