Abstract:
For speaker and text independent age recognition, a new multi-resolution feature extraction algorithm is proposed. The input speech is decomposed by wavelet packet transform, and then the wavelet packet coefficients of each effective frequency band are connected to form a intermediate signal for further calculating of its Mel-frequency cepstrum coefficients which is called Wavelet Packet Mel-Frequency Cepstrum Coefficient (WPMFC). The speaker age is divided into four age groups such as children, youths, adult and older, and totally eight Gaussian mixture models are trained for each age group and gender. Testing speech recognition decision is based on maximum likelihood criterion. The results of experimental prove that the performance of age recognition based on proposed feature extraction algorithm is successful compared with traditional short time spectral statistical analysis methods, the average recognition rate of outset speaker age reached 65.17%. What’s more, comparing with the influence of the change of the voice content, the change of the characteristics of the speaker's pronunciation has more influence on the recognition performance.