Abstract:
In order to realize voice conversion using non-parallel corpus, an efficient voice conversion method based on model adaptation is proposed in the paper. Firstly, the source and target speaker models were trained from background model using Maximum a Posteriori (MAP) adaptation algorithm, respectively. Then, a conversion function was trained by using mean vectors of adapted speaker models, and in order to improve the conversion performance, the conversion function was combined with INCA conversion algorithm, and a model adaptation based INCA method was further presented. The proposed method could efficiently transform the spectral features from source speaker to target one. Subjective and objective experiments were carried out to evaluate the performance of the proposed method, the results demonstrate that the proposed method obtains lower cepstral distortion, higher perceptual quality and similarity than INCA method. Meanwhile, compared with INCA algorithm, the proposed method using non-parallel speech corpus can achieve more comparable performance to Gaussian Mixture Model (GMM) based voice conversion method using parallel speech corpus.