Single-channel Speech Enhancement Method Based on Gated Residual Convolution Encoder-and-Decoder Network
-
摘要: 针对卷积编解码网络(CED, Convolution encoder-and-decoder)对语音时序相关信息捕获困难的问题,本文提出了一种基于门控残差卷积编解码网络的语音增强方法。该方法在卷积编解码网络的基础上引入了门控机制、膨胀卷积与残差连接:门控机制能够很好地处理序列前后相关信息;膨胀卷积使得卷积过程获得更大的感受野,提取更加丰富的全局信息;残差连接能够防止梯度消失与梯度爆炸,提升网络精度。此外,采用频域损失函数与时域评价指标联合优化的策略对网络进行训练,以进一步提升网络增强效果。实验表明,在匹配噪声和不匹配噪声下,相比于基线CED与其他对比方法,本文方法取得了更高的PESQ、STOI与SI-SDR,对语音的清浊音都有较好恢复效果,且具有较强的泛化能力。Abstract: In order to solve the problem that it is difficult for Convolution Encoder-and-Decoder (CED) network to capture temporal related contexts of speech, a speech enhancement method based on gated residuals convolution encoder-and-decoder network is proposed. Based on CED, this proposed method introduces the gating mechanism, dilated convolution and residual connection to the network: The gating mechanism can well handle the relevant contexts of sequence; Dilated convolution makes the convolution process obtain larger receptive field and extract more abundant global information; Residual connection can prevent vanishing gradient and exploding gradient and improve network accuracy. In addition, the combined optimization strategy of frequency-domain loss function and time-domain evaluation index is adopted to train the network to further improve the enhancement effect of propose network. Experimental results show that, compared with the baseline CED and other comparison methods, the proposed method achieves higher PESQ, STOI and SI-SDR under matched noise and mismatched noise, and it has a good recovery effect on the voiceless and voiced sounds of speech and has strong generalization ability.
-
-
期刊类型引用(6)
1. 张天骐,罗庆予,张慧芝,方蓉. 复谱映射下融合高效Transformer的语音增强方法. 信号处理. 2024(02): 406-416 . 本站查看
2. 解元,邹涛,余锦视,孙为军. 面向噪声和声学混响场景下的语音增强. 信号处理. 2024(12): 2238-2248 . 本站查看
3. 张天骐,罗庆予,方蓉,张慧芝. 基于信息提炼与残差特征聚合网络的单通道语音增强. 信号处理. 2023(07): 1285-1298 . 本站查看
4. 张天骐,熊天,吴超,闻斌. 基于压缩激励残差分组扩张卷积和密集线性门控Unet歌声分离方法. 应用科学学报. 2023(05): 815-830 . 百度学术
5. 金玉堂,王以松,王丽会,赵鹏利. 基于多尺度阶梯时频Conformer GAN的语音增强算法. 计算机应用. 2023(11): 3607-3615 . 百度学术
6. 范君怡,杨吉斌,张雄伟,郑昌艳. 基于Transformer的单通道语音增强模型综述. 计算机工程与应用. 2022(12): 25-36 . 百度学术
其他类型引用(6)
计量
- 文章访问数: 101
- HTML全文浏览量: 12
- PDF下载量: 195
- 被引次数: 12