基于多目标联合优化的语音增强方法研究

谢福仕; 康迂勇; 施明月; 郑能恒

doi:10.16798/j.issn.1003-0530.2021.10.024

基于多目标联合优化的语音增强方法研究

Speech Enhancement Method Based on Multi-Objective Joint Optimization#br#

摘要

摘要: 语音增强旨在从受噪声干扰的语音中提取目标语音，目前基于神经网络的语音增强方法在提升语音质量和可懂度方面已被证明是有效的。通过多目标联合优化，利用不同特征之间的互补性，可以提升基于神经网络的语音增强方法的性能。然而，这类多目标学习的语音增强方法在网络优化过程中，通常分别对单个输出目标进行损失函数的计算，多目标之间是并行的，并没有充分利用多目标之间可能存在的关联。为了在网络训练过程中增加输出目标间的关联，本文利用长短时记忆网络构建一种双输出系统框架，设计一种多目标损失函数计算策略用于网络训练。该框架估计出目标语音和噪声，基于此得到估计的带噪语音，然后对这三部分进行联合优化。实验结果表明，所提方法可以提高网络对噪声抑制能力，通过该策略可以获得质量更高，噪声残留更少的增强语音。

Abstract: Speech enhancement aims to extract target speech from noisy speech. Recently, neural networks (NN) have been effectively implement-ed for speech enhancement. In particular, network training with multi-objective joint optimization techniques, aiming to take advantage of the complementarity between different features, can significantly improve the quality and intelligibility of the target speech. However, in the network optimization of the multi-objective learning speech enhancement method, the loss function is usually calculated for a sin-gle output target separately, and the multiple targets are parallel, but the possible associations between the multiple targets are not fully utilized. This paper presents a speech enhancement framework using long-short term memory networks (LSTMs) with a dual-target output architecture. A multi-objective loss function is proposed for network training such that a balance between the global and local optima can be achieved. The framework estimates the target speech and noise to get the estimated noisy speech, and then optimizes the three parts jointly. Experimental results demonstrate the proposed method can effectively improve the noise suppression ability of the NNs. Through this strategy, enhanced speech with higher quality and less noise residue can be obtained.

HTML全文

参考文献(30)

施引文献

资源附件(0)