基于对抗训练的端到端语音翻译研究

End-to-end Speech Translation based on Adversarial Training

  • 摘要: 为了进一步利用源文本数据来提高语音翻译的性能,本文提出了一种基于生成对抗网络的端到端语音翻译算法。通过加入判别网络来判断语音特征序列和文本特征序列的真伪,从而引导生成模型来学习文本真实序列的分布,以使语音序列特征分布更加逼近文本特征序列的分布。引入了Wasserstein GAN(WGAN)来计算语音特征序列和文本特征序列通过判别器的标量似然值的Earth-Mover(EM)距离,来解决语音特征序列和文本特征序列存在长度不一致的问题。整个模型遵从多任务学习和对抗学习的训练准则,本文在How2数据集上和MuST-C英中数据集上验证了本文提出算法的有效性,该方法可以显著提升翻译质量。

     

    Abstract: In order to further use the source text data to improve the performance of speech translation, this paper proposes an end-to-end speech translation algorithm based on a generative adversarial network. By adding a discriminator network to judge the authenticity of the speech feature sequence and the text feature sequence, and guide the generation model to learn the distribution of the true sequence of the text, so that the feature distribution of the speech sequence can be closer to the distribution of the text feature sequence. Wasserstein GAN (WGAN) is introduced to calculate the Earth-Mover (EM) distance of the scalar likelihood values of the speech feature sequences and text feature sequences through the discriminator to solve the problem that the speech feature sequences and text feature sequences have inconsistent lengths. The entire model complies with the training criteria of multi-task learning and adversarial learning. This paper verifies the effectiveness of the proposed algorithm on the How2 dataset and the MuST-C English-Chinese dataset. This method can significantly improve the translation quality.

     

/

返回文章
返回