Abstract:
This paper aims to study independent and complete characterization of speaker-specific voice characteristics. Based on this, from the point of information separation, we will conduct a method on the separation between voice characteristics and linguistic content in speech, and carry out voice conversion. In this paper, we take full account of the K-SVD algorithm which can train the dictionary contains the personal characteristics and inter-frame correlation of voice. With this feature, the dictionary which contains the personal characteristics is extracted from training data through the K-SVD algorithm. Then we use the trained dictionary and other content information to reconstruct the target speech. Compared to traditional methods, the personal characteristics can be better preserved based on the proposed method through the sparse nature of voice and can easily solve the problems encountered in feature mapping methods and the voice conversion improvements are to be expected. Experimental results using subjective evaluations show that the proposed method outperforms the Gaussian Mixture Model and Artificial Neural Network based methods in the view of both speech quality and conversion similarity to the target.