民航陆空通话语音识别BiLSTM网络模型
Speech Recognition Model in Civil Aviation's Radiotelephony Communication Based on BiLSTM Neural Networks
-
摘要: 民航陆空通话对民航飞行安全十分重要,但因其通话模式有特殊的语法结构与发音方式,日常语音识别声学模型无法有效应用于民航陆空通话的语音处理问题。针对民航陆空通话的特殊语境,本文提出了基于双向长短时记忆网络(BiLSTM)的民航陆空通话语音识别方法。首先,提取民航陆空通话语音的FBANK特征作为输入,以时序链式连接(CTC)为目标函数,训练BiLSTM网络得到BiLSTM/CTC模型。然后,利用声学模型,语言模型与陆空通话词典实现民航陆空通话的语音识别,并结合数据增强与数据迁移对模型进行增强训练提高语音识别性能。实验结果表明本文提出的方法适用于民航陆空通话语音识别,并且数据增强模型可有效降低民航陆空通话语音识别的词错误率。
Abstract: The radiotelephony communication is crucial for flight safety in civil aviation. The special grammatical structure and pronunciation in civil aviation radiotelephony communication makes the traditional acoustic model of speech recognition not suitable for civil aviation radiotelephony communication context. In order to model the acoustic pattern of radiotelephony communication of civil aviation, a speech recognition method based on Bidirectional Long Short-Term Memory (BiLSTM) neural networks is proposed in this paper. First, the FBANK acoustic feature that extracted from speech dataset of civil aviation radiotelephony communication is as input and the connectionist temporal classification (CTC) objective function is used for training multi-layer BiLSTM neural networks. Then, using the BiLSTM/CTC acoustic model, language model and lexicon to realize the auto speech recognition of civil aviation radiotelephony communication. Based on the combination of data augmentation and data migration, the BiLSTM/CTC acoustic model is trained and enhanced to improve speech recognition performance. Experimental results show that the proposed methods are suitable for auto speech recognition in radiotelephony communication of civil aviation, and the data enhancement model can effectively reduce the word error rate.