HE Wenlong, GAO Changfeng, LI Ta, LIU Jian. End-to-end Speech Translation based on Adversarial Training[J]. JOURNAL OF SIGNAL PROCESSING, 2021, 37(5): 893-901. DOI: 10.16798/j.issn.1003-0530.2021.05.024
Citation: HE Wenlong, GAO Changfeng, LI Ta, LIU Jian. End-to-end Speech Translation based on Adversarial Training[J]. JOURNAL OF SIGNAL PROCESSING, 2021, 37(5): 893-901. DOI: 10.16798/j.issn.1003-0530.2021.05.024

End-to-end Speech Translation based on Adversarial Training

  • In order to further use the source text data to improve the performance of speech translation, this paper proposes an end-to-end speech translation algorithm based on a generative adversarial network. By adding a discriminator network to judge the authenticity of the speech feature sequence and the text feature sequence, and guide the generation model to learn the distribution of the true sequence of the text, so that the feature distribution of the speech sequence can be closer to the distribution of the text feature sequence. Wasserstein GAN (WGAN) is introduced to calculate the Earth-Mover (EM) distance of the scalar likelihood values of the speech feature sequences and text feature sequences through the discriminator to solve the problem that the speech feature sequences and text feature sequences have inconsistent lengths. The entire model complies with the training criteria of multi-task learning and adversarial learning. This paper verifies the effectiveness of the proposed algorithm on the How2 dataset and the MuST-C English-Chinese dataset. This method can significantly improve the translation quality.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return