采用最少门单元结构的改进注意力声学模型

An Improved Attention Based Acoustics Model with Minimal Gate Unit

摘要: 采用“编码-解码”结构的注意力声学模型存在参数规模庞大、收敛速度慢和在噪声环境中对齐关系不准确的问题。针对以上问题，先提出引入最少门结构单元减少模型参数，减少训练时间；再采用自适应宽度的窗函数和在计算注意力系数特征的卷积神经网络中加入池化层进一步提高音素与特征对齐的准确度，从而提升识别准确率。在英语和捷克语的实验结果表明，改进后的模型参数规模和音素错误率均下降，同时识别性能优于基于隐马可夫模型和基于连接时序分类算法的声学模型。

Abstract: The acoustic model based "encoder-decoder" architecture with attention mechanism suffers from large scale, slow convergence and inaccurate distribution of attention due to the noise. In view of these problems, it is proposed to utilize Minimal Gate Unit to reduce the model parameters the training time. Then utilize the adaptive window function and add the pooling layer to the convolution neural network to improve the recognition accuracy as well as the accuracy of alignments between phonemes and acoustic features. The results of the experiments in English and Czech corpus show a certain decrease in quantity of parameters and the phone error rate, and the recognition performance outperforms the hidden Markov model based acoustic model and Connectionist Temporal Classification.