采用DBN的TV改进方法在语种识别中的应用

Improved Total Variability Modeling Method using Deep Bottleneck Network for Language Identification

  • 摘要: 近年来基于深度神经网络(Deep Neural Network,DNN)的全差异空间建模方法(Total Variability, TV)在语种识别领域得到了广泛研究。本文提出了一种基于DNN的改进TV方法,既利用了DNN对数据的音素状态对齐效果,又充分考虑了语种任务的相关性。该方法首先利用带有瓶颈层的深层神经网络(Deep Bottleneck Network, DBN)对语种数据特征按照音素状态进行聚类,得到语种任务相关通用背景模型(Universal Background Model, UBM),然后利用该UBM模型并结合深度瓶颈特征(Deep Bottleneck Feature, DBF)进行TV建模。实验表明,与经典的TV方法相比,该方法能够显著的提升系统性能和效率,并且融合后性能得到了进一步提升。

     

    Abstract: Recently total variability (TV) using deep neural network (DNN) for mixture occupation probability computation have been applied for language identification (LID) and yield significant performance. However, the DNN is trained using a large corpus of specific language which is not directly related to the LID task. In this paper, we proposed an improved total variability (TV) modeling method based on a well-trained deep neural network (DNN) for language identification, which can make full use of the phoneme state clustering of DNN and give consideration to the LID task specificity. Specifically, we use a structured DNN with a narrow bottleneck layer, named as deep bottleneck network (DBN), to extract the deep bottleneck features (DBF) and then cluster the training data to get the Gaussian mixture model based on the corresponding output of this DBN. Evaluation on the Arabic Dialect task of NIST LRE2011 and six languages subset of NIST LRE2009 show that the proposed method outperforms the-state-of-the-art DNN based TV LID systems. Further performance improvement can be achieved by system fusion.

     

/

返回文章
返回