Voice Energy Balance Method for Text to Speech Database
-
摘要: 语音库的质量是决定语音合成(Text to Speech, TTS)效果的重要因素。TTS语音库的制作周期需要六个月左右,期间,发音人的录音状态需要保持一致,即音色、能量皆不能有大的差异,这对于发音人来说是较为困难的。为此,本文给出语音能量均衡方法,其中包括时域包络波动检测算法和帧能量平均算法,旨在解决TTS语音数据库录制后能量不一致现象。首先分析获得标准语音的相关能量参数和波动参数作为模板;其次,利用时域包络波动检测算法对预调节语音样本的合格性进行检验;最后根据帧能量平均准则,对所有合格语音样本进行时域幅值调整,以最大限度地保证语音库整体能量的一致性。实验结果表明,本文提出的语音能量均衡方法可以有效提升TTS语音库质量,具有实际工程意义。Abstract: The quality of speech library is an important factor, which determines the effect of Speech to Text (TTS). The production cycle of the TTS speech database needs about six months. During the period, the voice state recording needs to be consistent, that is, the tone and energy can not have a big difference, which is more difficult for pronunciation. Thus, this paper gives voice energy balance method, including the time-domain envelope detection algorithm and the frame energy average algorithm, aiming to solve the TTS speech database recording after the phenomenon of inconsistency. Firstly, obtaining the standard speech related energy parameters and wave parameters as a template; secondly, using the time-domain envelope fluctuation detection algorithm to check the pre-regulation speech samples test. Finally according to the frame energy average criterion of all qualified speech samples, adjusting the samples amplitude in time domain value, to maximize the overall energy of the speech database consistency. The experimental results show that the proposed method can effectively improve the quality of the TTS speech database, and has practical engineering significance.
-
Keywords:
- speech to text /
- energy balance /
- time-domain envelope detection
-
-
[1] Heiga Zen,Andrew Senior,Mike Schuster.Statistical parametric speech synthesis using deep neural networks[C].IEEE International Conference on Aconstic,Speech and Signal Processing,2013: 7962- 7966. [2] Youcef tabet,Mohamed boughazi. Speech synthesis techniques.a survey[C].7th International Workshop on Systems,Signal Processing and their Applications (WOSSPA),2011: 67-70. [3] 杨辰雨.语音合成音库自动标注方法研究[D].安徽合肥:中国科学技术大学,2014. Chen-Yu Yang.Research on automatic labeling of speech synthesis corpora[D].Anhui,Hefei: University of Science and Technology of china,2014.
[4] 庞敏辉.语音库自动构建技术的研究[D].山东青岛:中国海洋大学,2010. Min-hui Pang.Study on antomatic construction of speech database[D].Shandong,Qingdao: Ocean University of China.
[5] D. Sharma and P. A. Naylor. Evaluation of pitch estimation in noisy speech for application in nonintrusive speech quality assessment. Proc European Signal Processing Conf, Aug. 2009, pp. 2514–2518. [6] Charles K.Chui.An Introduction to Wavelets[M]. NewYork:Academic Press, 1992. [7] Sunil Tyagi. Wavelet Analysis And Envelope Detection For Rolling Element Bearing Fault Diagnosis -A Comparative Study[J]. Center of Marine Engineering Technology INS Shivaji, Lonavla,2001,pp.402-410. [8] Samar, V. J., Bopardikar, A., Rao, R. and Swartz, K..Wavelet Analysis of Neuroelectric Waveforms[J]. Brain and Language, 66, 1999. [9] 张勇,刘轶,刘宏.结合人耳听觉感知的两级语音增强算法[J].信号处理,2014,30( 4) : 363- 373. Zhang Yong,Liu Yi,Liu Hong.A two-stage speech enhancement algorithm combined with human auditory perception[J]. Journal of Signal Processing,2014,30 (4):363- 373.( in Chinese)
[10] 刘凤山,吕钊,张超,等.改进小波阈值函数的语音增强算法研究[J]. 信号处理,2016,32(2):203-212. LIU Feng-shan, LV Zhao, ZHANG Chao,et al. Research on Speech Enhancement Algorithm Based on Modified Wavelet Threshold Function[J]. Journal of Signal Processing,2016,32 (2):203- 212.( in Chinese)
[11] Sira Gonzalez, Mike Brookes. A Pitch Estimation Filter Robust to High Levels of Noise [C]. Proc European Signal Processing Conference , Barcelona, Spain, 2011:451-455. [12] J. Ramirez, C. Segura, C. Benitez, A. Torre and A. Rubio, “A new Kullback-Leibler VAD for speech recognition in noise,” IEEE Signal Processing Letters, vol. 11, no. 2, 2004. [13] D. Vlaj, B. Kotnik, B. Horvat and Z. Kacic, A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems, EURASIP Journal of Applied Signal Processing 2005:4, 487-497. [14] 钟林鹏. 说话人识别系统中的语音信号处理技术研究[D].四川成都:电子科技大学,2010. Zhong Linpeng. Studies On The Speech Signals Processing Of The Speaker Recognition System[D].Sichuan,Chengdu: University of Electronic Science and Technology of China,2010.
[15] 朱雪龙.应用信息论基础.北京:清华大学出版社, 2002. Zhu Xue-long. information theory application and Basics.Beijing: tsinghua university press,2002.
[16] 刘晓明,覃胜,刘宗行,等.语音端点检测的仿真研究.系统仿真学报, 2005(8)-17. Liu Xiaoming, Qin Sheng, Liu Zonghang, et al. Simulation study on speech endpoint detection. Journal of system simulation, 2005 (8) -17.
-
期刊类型引用(1)
1. 吴华芹. 云计算海量光纤数据的差异化调度研究. 激光杂志. 2019(01): 155-158 . 百度学术
其他类型引用(1)
计量
- 文章访问数: 129
- HTML全文浏览量: 6
- PDF下载量: 1106
- 被引次数: 2