说话人确认中以音素为中心的特征端因子分析
Phoneme-centric Acoustic Factor Analysis for Speaker Verification
-
摘要: 在说话人确认中,特征端因子分析(Acoustic Factor Analysis, AFA)利用MPPCA(Mixtures of Probabilistic Principal Component Analyzers, MPPCA)算法在通用背景模型(Universal Background Model, UBM)的每个高斯上分别对特征降维以去除语音特征中文本、信道和噪声等信息的干扰,获得增强的说话人信息并用于提升说话人确认的性能。但是通用背景模型属于无监督的聚类方法,其每个高斯成分物理意义不够明确,不能区分不同说话人发不同音素时的情况。为解决这一问题,本文利用语音识别中的声学模型深度神经网络(Deep Neural Network, DNN)取代传统的通用背景模型并结合特征端因子分析分别对不同音素上的语音特征进行降维提取出说话人信息,进而提取DNN i-vector用于说话人确认。在RSR2015数据库PartIII上的实验结果表明该方法相对于基于UBM的特征端因子分析方法在男女测试集上等错误率(Equal Error Rate, EER)分别下降13.49%和22.43%.Abstract: In speaker verification, Acoustic factor analysis uses MPPCA algorithm to derive a mixture dependent dimensionality reduction of the acoustic feature in every single component of Universal Background Model, which can eliminate channel mismatch and noise interference and use the enhanced speaker information to improve the performance of speaker verification. However, UBM is trained in an unsupervised method and each Gaussian has no defining acoustic meaning, which can’t distinguish between different speakers saying different types of phoneme. To address this, this paper replaced UBM with Deep Neural Network of ASR acoustic model in acoustic factor analysis and derived a phoneme dependent dimensionality reduction of the acoustic feature to extract speaker information which was used to extract the DNN i-vector for speaker verification. The experiment on RSR2015 PartIII showed that acoustic factor analysis based on the phoneme can achieve a valid reduction of 13.49% and 22.43% at the EER compared to acoustic factor analysis based UBM when evaluated on male and female test set separately.