声韵母约束扩展识别网络的发音偏误检测

Mispronunciation Detection with Extended Recognition Network of Initials and Finals Constraint

  • 摘要: 发音偏误检测是计算机辅助发音训练(Computer Aided Pronunciation Training ,CAPT)的重要组成部分。为了在机器辅助语料标注任务或者缺少标注语料的偏误检测任务上提高性能,本文提出解码时使用声韵母约束的扩展识别网络方法。该方法将传统的语音识别中解码的自由文法循环(free grammar loop)部分换成结合声韵母交替以及字数限制规则的扩展识别网络,可以对全音素进行偏误检测, 并且不会出现插入删除错误。相比于传统的扩展识别网络,这种约束的扩展识别网络不需要大量的语料标注和分析。相对于传统的发音良好度评价方法(Goodness of Pronunciation, GOP), 基于这种拓展识别网络的方法不仅可以对二语学习者的发音进行正误的检测,还能给出具体的错误反馈。实验结果表明,本文提出的基于声韵母约束拓展识别网络的方法在挑错任务上优于传统的发音质量评估(GOP)的方法,其错误接受率为29.2%,错误拒绝率为22.9%,诊断准确率为76.6%。比GOP方法的诊断准确率相对提升15.5%,并且模型相较于无标注经验汉语母语者能检测出更多偏误。

     

    Abstract: Mispronunciation detection is an important part of computer aided pronunciation training (CAPT). In order to improve the performance of machine aided corpus annotation task or error detection task without annotating corpus. In this paper, we proposed a method of combining Deep Neural Network with initials and finals constrained extended recognition network (cERN), which replaced traditional ASR decoding’s free grammar loop part with cERN. The proposed model can detect errors from all kind of phones and without insertion, deletion errors. Compared with the traditional extended recognition network, this constrained extended recognition network does not need a lot of corpus annotation and analysis. Compared with Goodness of Pronunciation (GOP), this method can not only detect whether pronunciation is correct, but also can give learners error details. The experiment result shows that cERN is better than GOP in the task of pronunciation detection. The false accept rate is 29.2% and false reject rate is 22.9%. The accuracy rate of cERN is 76.6%, which improve 15.5% than GOP’s accuracy rate and is better than the result of untrained annotates.

     

/

返回文章
返回