多分形谱簇研究及其在说话人识别中的应用

Research on Multifractal Spectrum Cluster and Its Application in Speaker Recognition

  • 摘要: 语音是一种复杂的非线性信号,这使得基于线性系统理论发展起来的传统说话人识别技术性能难以进一步提高。本文提出了多分形谱簇分析方法,用于分析语音信号的非线性特征,并应用于短语音(2秒)说话人识别。通过对Cantor集的仿真实验,发现不同标度区能反映出系统不同阶段的生长规律,因此可用一组连续变化的多分形谱分层次地表征系统的分形特性,即多分形谱簇分析方法。然后结合语信号的分形特点,提出一种语音的多分形谱簇特征(Multifractal Spectrum Cluster Feature, MSCF)的提取方法。最后将几种非线性特征与短时谱特征结合用于说话人识别,基于TIMIT数据库50人的实验表明,非线性特征与短时谱特征互补性较强,特别是MSCF与MFCC、LPC特征结合,使得系统的误识率下降到0.8%。

     

    Abstract: Speech is a complicated nonlinear signal, so traditional speaker recognition technology based on the linear theory is difficult to be further improved. Hence, the multifractal spectrum cluster analytical method is proposed, and applied to the analysis of nonlinear characteristic of the speech signal in speaker recognition of short speech. Through extensive experiments for Cantor sets, it is found that sub-scaling ranges, which were neglected by traditional multifractal method, actually reflected the growth pattern in different growth stages. Therefore, in order to fully consider the fractal characteristics contained in different scaling range, the multifractal spectrum cluster analytical method is proposed to describe the multi-level fractal characteristics accurately and comprehensively. Then, according to the characteristic of the speech signal, an extraction method of speaker multifractal spectrum cluster feature (MSCF) is proposed, which could combine with short-term spectral feature in feature layer effectively. Finally, the combinations of several nonlinear features and short-term spectral feature are applied to speaker recognition. Experiment results based on the TIMIT show that nonlinear feature and short-term spectral feature are highly complementary, which make the error rate of speaker recognition system decrease obviously, especially the combination of MSCF, MFCC and LPC can reduce the error rate to 0.8% in short speech speaker recognition.

     

/

返回文章
返回