End-to-end Speech Translation based on Adversarial Training

HE Wenlong; GAO Changfeng; LI Ta; LIU Jian

doi:10.16798/j.issn.1003-0530.2021.05.024

HE Wenlong, GAO Changfeng, LI Ta, LIU Jian. End-to-end Speech Translation based on Adversarial Training[J]. JOURNAL OF SIGNAL PROCESSING, 2021, 37(5): 893-901. DOI: 10.16798/j.issn.1003-0530.2021.05.024

Citation:

End-to-end Speech Translation based on Adversarial Training

Graphical Abstract

Abstract

Abstract

In order to further use the source text data to improve the performance of speech translation, this paper proposes an end-to-end speech translation algorithm based on a generative adversarial network. By adding a discriminator network to judge the authenticity of the speech feature sequence and the text feature sequence, and guide the generation model to learn the distribution of the true sequence of the text, so that the feature distribution of the speech sequence can be closer to the distribution of the text feature sequence. Wasserstein GAN (WGAN) is introduced to calculate the Earth-Mover (EM) distance of the scalar likelihood values of the speech feature sequences and text feature sequences through the discriminator to solve the problem that the speech feature sequences and text feature sequences have inconsistent lengths. The entire model complies with the training criteria of multi-task learning and adversarial learning. This paper verifies the effectiveness of the proposed algorithm on the How2 dataset and the MuST-C English-Chinese dataset. This method can significantly improve the translation quality.

FullText(HTML)

References (24)

Supplements (0)

Cited By

End-to-end Speech Translation based on Adversarial Training

Abstract

Catalog

Export File

Citation

Format

Content