End to End Uyghur Speech Recognition Based on Multi Task Learning

SUBI Aiyiti; NURMEMET Yolwas; HUANG Hao; WUSHOUR Silamu

doi:10.16798/j.issn.1003-0530.2021.10.008

SUBI Aiyiti, NURMEMET Yolwas, HUANG Hao, WUSHOUR Silamu. End to End Uyghur Speech Recognition Based on Multi Task Learning[J]. JOURNAL OF SIGNAL PROCESSING, 2021, 37(10): 1852-1859. DOI: 10.16798/j.issn.1003-0530.2021.10.008

Citation:

End to End Uyghur Speech Recognition Based on Multi Task Learning

Graphical Abstract

Abstract

Abstract

Uyghur language is an agglutinative language with a large vocabulary, which easily causes the problem of unregistered words. Furthermore, it is also a low-resource language, resulting in low performance of its end-to-end speech recognition model. In order to address foregoing problems, a multi-task learning based end-to-end Uyghur speech recognition model is proposed herein. At the encoder layer , conformer is used and linked to connectionist temporal classification (CTC). By introducing BPE-dropout, more robust modeling units are created. Then, with sub-words and characters as modeling units, multi-task training and decoding are carried out at the same time. The experimental result suggests that the use of sub-word as modeling unit will provide an effective solution to the problem of unregistered words and multi-task learning model achieves full utilization of data in low-resource environment and learn rich time-series speech feature information, thus further promoting the recognition performance of model. In the publicized Uyghur speech data set THUYG-20, the error of sub-words and characters is reduced by 7.3% and 3.8% respectively compared with the baseline

FullText(HTML)

References (21)

Supplements (0)

Cited By

End to End Uyghur Speech Recognition Based on Multi Task Learning

Abstract

Catalog

Export File

Citation

Format

Content