采用超音段韵律特征联合短时频谱的语音转换

Voice Conversion Using Spectrum with Super-Segment Prosody Features

  • 摘要: 传统的语音转换方法往往着重于语音的声道特征和基频的转换,而忽视了其他的超音段韵律特征,这导致转换后的语音目标倾向性不够明显,合成语音自然度不高,不能很好地反应说话人个性化特征。本文在短时谱包络转换的基础上,加入了基频、语速、停顿、重音等多种超音段韵律特征进行转换处理,以提高语音转换性能。其中,采用基频目标模型对基音频率建模,然后运用高斯混合模型(GMM)训练得到转换规则,而语速、停顿、重音则采用基于单高斯统计分析的最大似然估计方法训练得到转换规则。实验结果表明,在加入超音段韵律特征转换之后,系统非常明显地提高了转换语音的目标倾向性和自然度。

     

    Abstract: Only few of prosody features such as fundamental frequency is used in common voice conversion system, so the conversion speech has weak target tendency and poor quality especially when speakers have strong speaking styles. In this paper, a new conversion method based on short-time spectrum and prosodies such as pitch contour, duration, pause and stress is proposed. Pitch contour is first described by pitch target model and then trained by Gaussian mixture model (GMM), the other prosodies are modeled by single Gaussian distribution model after statistical analysis. The experiment result show the target tendentiousness and naturalness of converted speech are well improved after use of rich prosody features comparing with traditional system.

     

/

返回文章
返回