Abstract:
The performance of a speech recognition system is affected by many factors, such as different speakers, speaking style, ambient noise etc. In order to improve the system's ability to be more accurate and robust despite these factors, one important solution is to look for some better and more robust representations of the acoustic signal based on the principle of human perceptional feature. The human internal acoustic representation has previously been investigated by using the 3-Dimensional Deep Search (3DDS) method. This method has proven successful in finding perceptual cue of plosive and fricative consonants in natural speech. In this paper, the method is extended to predict the perceptual cues for the nasal consonants /m, n/. Based on analysis of the results from three experiments, the redundant cue and secondary perceptual cue are defined. The perceptual cue of /m/ is speech component lying around from 363~1250 Hz and the perceptual cue of /m/ is speech component lying around from 939~2826 Hz.