采用模型自适应的语音转换方法

Voice Conversion Method Based On Model Adaptation

  • 摘要: 针对非对称语音库情况下的语音转换,提出了一种有效的基于模型自适应的语音转换方法。首先,通过最大后验概率(Maximum A Posteriori,MAP)方法从背景模型分别自适应训练得到源说话人和目标说话人的模型;然后,通过说话人模型中的均值向量训练得到频谱特征的转换函数;并进一步与传统的INCA转换方法相结合,提出了基于模型自适应的INCA语音转换方法,有效实现了源说话人频谱特征向目标说话人频谱特征的转换。通过客观测试和主观测听实验对提出的方法进行评价,实验结果表明,与INCA语音转换方法相比,本文提出的方法可以取得更低的倒谱失真、更高的语音感知质量和目标倾向度;同时更接近传统基于对称语音库的高斯混合模型(Gaussian Mixture Model,GMM)的语音转换方法的效果。

     

    Abstract: In order to realize voice conversion using non-parallel corpus, an efficient voice conversion method based on model adaptation is proposed in the paper. Firstly, the source and target speaker models were trained from background model using Maximum a Posteriori (MAP) adaptation algorithm, respectively. Then, a conversion function was trained by using mean vectors of adapted speaker models, and in order to improve the conversion performance, the conversion function was combined with INCA conversion algorithm, and a model adaptation based INCA method was further presented. The proposed method could efficiently transform the spectral features from source speaker to target one. Subjective and objective experiments were carried out to evaluate the performance of the proposed method, the results demonstrate that the proposed method obtains lower cepstral distortion, higher perceptual quality and similarity than INCA method. Meanwhile, compared with INCA algorithm, the proposed method using non-parallel speech corpus can achieve more comparable performance to Gaussian Mixture Model (GMM) based voice conversion method using parallel speech corpus.

     

/

返回文章
返回