Abstract:
We proposed a length normalization maximum a posterior (MAP) algorithm, which can be applied to Cross Likelihood Ratio (CLR) and TTest distance metric methods in speaker diarization.Since the shift from the UBM in adaptation procedure is based on statistics calculated against the Universal Background Model (UBM),the model parameters obtained from the classical MAP method have a positive correlation with the length of the speech segment. When measuring the similarity of two segments with different length, the classical MAP method will bring about speaker models variability, which would affect the distance metric in speaker diarization. We proposed to apply length normalization to the relevant factor before adapting the parameters of the speaker model.Hence, the model parameters are irrelevant to the length of the speech, and it can reflect the speakers identity better.In the speaker diarization task of a Chinese multispeaker TV talk show,Compared with the classical MAP, the proposed normalized MAP method can reduce the diarization error rate by 35% in the CLR clustering method and by 107% in the TTest clustering method.