语音合成系统中语音库样本能量均衡方法研究

Voice Energy Balance Method for Text to Speech Database

  • 摘要: 语音库的质量是决定语音合成(Text to Speech, TTS)效果的重要因素。TTS语音库的制作周期需要六个月左右,期间,发音人的录音状态需要保持一致,即音色、能量皆不能有大的差异,这对于发音人来说是较为困难的。为此,本文给出语音能量均衡方法,其中包括时域包络波动检测算法和帧能量平均算法,旨在解决TTS语音数据库录制后能量不一致现象。首先分析获得标准语音的相关能量参数和波动参数作为模板;其次,利用时域包络波动检测算法对预调节语音样本的合格性进行检验;最后根据帧能量平均准则,对所有合格语音样本进行时域幅值调整,以最大限度地保证语音库整体能量的一致性。实验结果表明,本文提出的语音能量均衡方法可以有效提升TTS语音库质量,具有实际工程意义。

     

    Abstract: The quality of speech library is an important factor, which determines the effect of Speech to Text (TTS). The production cycle of the TTS speech database needs about six months. During the period, the voice state recording needs to be consistent, that is, the tone and energy can not have a big difference, which is more difficult for pronunciation. Thus, this paper gives voice energy balance method, including the time-domain envelope detection algorithm and the frame energy average algorithm, aiming to solve the TTS speech database recording after the phenomenon of inconsistency. Firstly, obtaining the standard speech related energy parameters and wave parameters as a template; secondly, using the time-domain envelope fluctuation detection algorithm to check the pre-regulation speech samples test. Finally according to the frame energy average criterion of all qualified speech samples, adjusting the samples amplitude in time domain value, to maximize the overall energy of the speech database consistency. The experimental results show that the proposed method can effectively improve the quality of the TTS speech database, and has practical engineering significance.

     

/

返回文章
返回