采用长度规整MAP的说话人分割聚类

Speaker Diarization Based on Length Normalization MAP

  • 摘要: 本文首次提出了长度规整的最大后验估计(MAP)方法,并将其应用到说话人分割聚类中的交叉似然比(CLR)和TTest这两种度量距离上。传统的MAP方法需要在通用背景模型(UBM)基础上进行统计量的计算,进而对模型参数进行自适应偏移,因此偏移的程度与语音片段的长度正相关。当在度量两个长度不相同的语音片段的相似性时,传统的MAP方法会使得说话人模型刻画不准确,从而影响距离度量。本文在MAP过程中,根据语音的长度对相关因子进行规整,然后再进行模型参数的调整,从而使得模型参数与语音长度无关,更能体现说话人的身份信息。在中文多人电视访谈节目数据的分割聚类评测任务上,采用长度规整的MAP方法相对于传统方法都有明显提升,在CLR度量准则下分割聚类错误率相对下降了35%,在TTest度量准则下分割聚类错误率相对下降了107%。

     

    Abstract: We proposed a length normalization maximum a posterior (MAP) algorithm, which can be applied to Cross Likelihood Ratio (CLR) and TTest distance metric methods in speaker diarization.Since the shift from the UBM in adaptation procedure is based on statistics calculated against the Universal Background Model (UBM),the model parameters obtained from the classical MAP method have a positive correlation with the length of the speech segment. When measuring the similarity of two segments with different length, the classical MAP method will bring about speaker models variability, which would affect the distance metric in speaker diarization. We proposed to apply length normalization to the relevant factor before adapting the parameters of the speaker model.Hence, the model parameters are irrelevant to the length of the speech, and it can reflect the speakers identity better.In the speaker diarization task of a Chinese multispeaker TV talk show,Compared with the classical MAP, the proposed normalized MAP method can reduce the diarization error rate by 35% in the CLR clustering method and by 107% in the TTest clustering method.

     

/

返回文章
返回