基于细粒度韵律建模和条件CycleGAN的非平行蒙古语语音转换方法

Non-parallel Mongolian Voice Conversion Method Based on Fine-grained Prosody Modeling and Conditional CycleGAN

  • 摘要: 语音转换技术在保持语义内容不变的前提下将源说话人的语音音色转换为目标说话人。目前,蒙古语语音转换面临语料匮乏、蒙古语字词在发音上韵律变化丰富等问题。针对这些问题,本文提出一种基于细粒度韵律建模和条件CycleGAN的非平行蒙古语语音转换方法。该方法首先使用连续小波变换提取细粒度的语音韵律特征,然后向CycleGAN中加入说话人向量构建条件CycleGAN,最后使用条件CycleGAN得到源说话人和目标说话人之间稳定的韵律转换。实验结果表明,该方法与传统CycleGAN语音转换方法相比能够有效提升蒙古语语音转换效果,在语音自然度和说话人相似度的MOS评分上分别提升了0.1和0.2。

     

    Abstract: The voice conversion technique converts the voice tone of the source speaker to the target speaker while keeping the linguistic information unchanged. At present, Mongolian voice conversion is facing problems such as lack of corpus and rich prosodic changes in pronunciation of Mongolian words. To address these problems, this paper presents a non-parallel Mongolian voice conversion method based on fine-grained prosody modeling and conditional CycleGAN. This method used continuous wavelet transform to extract fine-grained prosodic features, then added speaker identity vectors to the CycleGAN to build a conditional CycleGAN, Finally, the conditional CycleGAN was used to obtain a stable prosody conversion between source and target speakers. Experimental results showed that compared with the traditional CycleGAN voice conversion method, this method can effectively improve the Mongolian voice conversion effect, and the MOS scores of speech naturalness and speaker similarity are improved by 0.1 and 0.2 respectively.

     

/

返回文章
返回