SUBI Aiyiti, NURMEMET Yolwas, HUANG Hao, WUSHOUR Silamu. End to End Uyghur Speech Recognition Based on Multi Task Learning[J]. JOURNAL OF SIGNAL PROCESSING, 2021, 37(10): 1852-1859. DOI: 10.16798/j.issn.1003-0530.2021.10.008
Citation: SUBI Aiyiti, NURMEMET Yolwas, HUANG Hao, WUSHOUR Silamu. End to End Uyghur Speech Recognition Based on Multi Task Learning[J]. JOURNAL OF SIGNAL PROCESSING, 2021, 37(10): 1852-1859. DOI: 10.16798/j.issn.1003-0530.2021.10.008

End to End Uyghur Speech Recognition Based on Multi Task Learning

  • Uyghur language is an agglutinative language with a large vocabulary, which easily causes the problem of unregistered words. Furthermore, it is also a low-resource language, resulting in low performance of its end-to-end speech recognition model. In order to address foregoing problems, a multi-task learning based end-to-end Uyghur speech recognition model is proposed herein. At the encoder layer , conformer is used and linked to connectionist temporal classification (CTC). By introducing BPE-dropout, more robust modeling units are created. Then, with sub-words and characters as modeling units, multi-task training and decoding are carried out at the same time. The experimental result suggests that the use of sub-word as modeling unit will provide an effective solution to the problem of unregistered words and multi-task learning model achieves full utilization of data in low-resource environment and learn rich time-series speech feature information, thus further promoting the recognition performance of model. In the publicized Uyghur speech data set THUYG-20, the error of sub-words and characters is reduced by 7.3% and 3.8% respectively compared with the baseline
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return