基于门控残差卷积编解码网络的单通道语音增强方法

Single-channel Speech Enhancement Method Based on Gated Residual Convolution Encoder-and-Decoder Network

  • 摘要: 针对卷积编解码网络(CED, Convolution encoder-and-decoder)对语音时序相关信息捕获困难的问题,本文提出了一种基于门控残差卷积编解码网络的语音增强方法。该方法在卷积编解码网络的基础上引入了门控机制、膨胀卷积与残差连接:门控机制能够很好地处理序列前后相关信息;膨胀卷积使得卷积过程获得更大的感受野,提取更加丰富的全局信息;残差连接能够防止梯度消失与梯度爆炸,提升网络精度。此外,采用频域损失函数与时域评价指标联合优化的策略对网络进行训练,以进一步提升网络增强效果。实验表明,在匹配噪声和不匹配噪声下,相比于基线CED与其他对比方法,本文方法取得了更高的PESQ、STOI与SI-SDR,对语音的清浊音都有较好恢复效果,且具有较强的泛化能力。

     

    Abstract: In order to solve the problem that it is difficult for Convolution Encoder-and-Decoder (CED) network to capture temporal related contexts of speech, a speech enhancement method based on gated residuals convolution encoder-and-decoder network is proposed. Based on CED, this proposed method introduces the gating mechanism, dilated convolution and residual connection to the network: The gating mechanism can well handle the relevant contexts of sequence; Dilated convolution makes the convolution process obtain larger receptive field and extract more abundant global information; Residual connection can prevent vanishing gradient and exploding gradient and improve network accuracy. In addition, the combined optimization strategy of frequency-domain loss function and time-domain evaluation index is adopted to train the network to further improve the enhancement effect of propose network. Experimental results show that, compared with the baseline CED and other comparison methods, the proposed method achieves higher PESQ, STOI and SI-SDR under matched noise and mismatched noise, and it has a good recovery effect on the voiceless and voiced sounds of speech and has strong generalization ability.

     

/

返回文章
返回