Abstract:
Mispronunciation detection is an important part of computer aided pronunciation training (CAPT). In order to improve the performance of machine aided corpus annotation task or error detection task without annotating corpus. In this paper, we proposed a method of combining Deep Neural Network with initials and finals constrained extended recognition network (cERN), which replaced traditional ASR decoding’s free grammar loop part with cERN. The proposed model can detect errors from all kind of phones and without insertion, deletion errors. Compared with the traditional extended recognition network, this constrained extended recognition network does not need a lot of corpus annotation and analysis. Compared with Goodness of Pronunciation (GOP), this method can not only detect whether pronunciation is correct, but also can give learners error details. The experiment result shows that cERN is better than GOP in the task of pronunciation detection. The false accept rate is 29.2% and false reject rate is 22.9%. The accuracy rate of cERN is 76.6%, which improve 15.5% than GOP’s accuracy rate and is better than the result of untrained annotates.