基于语音个人特征信息分离的语音转换方法研究

A speech conversion method based on the separation of speaker-specific characteristics

  • 摘要: 本文在深入研究语音个人特征信息有效表示的基础上,从信息分离角度,提出一种新的利用个人特征信息分离和替换实现语音转换的方法。该方法主要利用语音的稀疏性和K -均值奇异值分解(K-SVD)来实现。由于这种基于K-SVD的字典训练方法可以较好地保存语音信号中的个人特征信息,因此可以利用K-SVD的字典训练方法把语音个人特征信息进行分离并替换,再和语言内容等信息重构出目标语音。相对于传统方法,本方法能够更好地利用语音的稀疏性保存语音个人特征信息,从而可以克服参数映射带来的转换后语音个人特征相似度不高和语音质量下降的问题。实验仿真及主观评价结果表明,与基于高斯混合模型、人工神经网络的语音转换方法相比,该方法具有更好的转换语音质量和转换相似度以及抗噪性。

     

    Abstract: This paper aims to study independent and complete characterization of speaker-specific voice characteristics. Based on this, from the point of information separation, we will conduct a method on the separation between voice characteristics and linguistic content in speech, and carry out voice conversion. In this paper, we take full account of the K-SVD algorithm which can train the dictionary contains the personal characteristics and inter-frame correlation of voice. With this feature, the dictionary which contains the personal characteristics is extracted from training data through the K-SVD algorithm. Then we use the trained dictionary and other content information to reconstruct the target speech. Compared to traditional methods, the personal characteristics can be better preserved based on the proposed method through the sparse nature of voice and can easily solve the problems encountered in feature mapping methods and the voice conversion improvements are to be expected. Experimental results using subjective evaluations show that the proposed method outperforms the Gaussian Mixture Model and Artificial Neural Network based methods in the view of both speech quality and conversion similarity to the target.

     

/

返回文章
返回