Abstract:
The quality of speech library is an important factor, which determines the effect of Speech to Text (TTS). The production cycle of the TTS speech database needs about six months. During the period, the voice state recording needs to be consistent, that is, the tone and energy can not have a big difference, which is more difficult for pronunciation. Thus, this paper gives voice energy balance method, including the time-domain envelope detection algorithm and the frame energy average algorithm, aiming to solve the TTS speech database recording after the phenomenon of inconsistency. Firstly, obtaining the standard speech related energy parameters and wave parameters as a template; secondly, using the time-domain envelope fluctuation detection algorithm to check the pre-regulation speech samples test. Finally according to the frame energy average criterion of all qualified speech samples, adjusting the samples amplitude in time domain value, to maximize the overall energy of the speech database consistency. The experimental results show that the proposed method can effectively improve the quality of the TTS speech database, and has practical engineering significance.