Abstract:
End-to-end speech recognition models didn’t require pronunciation dictionaries during training and could significantly reduce the burden of developing speech recognition systems for new languages. This paper exploits this advantage of end-to-end models to build a language-independent end-to-end multilingual speech recognition system. The model is trained using a character-based approach, in which a multilingual output symbol set is constructed so that it includes characters that occur in all target languages. The model is trained to generate a single model with network parameters shared by all languages. The multilingual speech recognition system proposed in this paper performs better on all languages compared to the monolingual speech recognition system on the 10 languages dataset provided by the Oriental Language Recognition (OLR) Challenge.