Abstract:
Recently total variability (TV) using deep neural network (DNN) for mixture occupation probability computation have been applied for language identification (LID) and yield significant performance. However, the DNN is trained using a large corpus of specific language which is not directly related to the LID task. In this paper, we proposed an improved total variability (TV) modeling method based on a well-trained deep neural network (DNN) for language identification, which can make full use of the phoneme state clustering of DNN and give consideration to the LID task specificity. Specifically, we use a structured DNN with a narrow bottleneck layer, named as deep bottleneck network (DBN), to extract the deep bottleneck features (DBF) and then cluster the training data to get the Gaussian mixture model based on the corresponding output of this DBN. Evaluation on the Arabic Dialect task of NIST LRE2011 and six languages subset of NIST LRE2009 show that the proposed method outperforms the-state-of-the-art DNN based TV LID systems. Further performance improvement can be achieved by system fusion.