基于双加性角裕度边际损失的说话人验证

Double Additive Angular Margin Loss for Speaker Verification

  • 摘要: 说话人验证是深度学习领域的一项关键任务,旨在根据说话人的声音来验证其身份。加性边际损失(Additive Margin loss, AM loss)和加性角裕度边际损失(Additive Angular Margin loss, AAM loss)是说话人验证中常用的损失函数。这两种方法通过引入边际项,显著提高了同类样本的聚集度。然而,这些方法只考虑了类内距离而忽略了不同类别之间的关系,限制了区分不同说话人的能力。为解决这一缺陷,本文提出一种新的损失函数,称为双加性角裕度边际损失(Double Additive Angular Margin loss, DAAM loss)。DAAM损失为类内角度和类间角度引入边际常数,有效地考虑了类内内聚和类间分离。同时压缩类内距离和扩大类间距离,DAAM损失增强了对说话人嵌入的区分能力,从而提高了性能。为了验证DAAM损失的有效性,我们采用不同损失监督的说话人验证模型Xvector和Extended-Xvector(Evector)来提取说话人嵌入信息,在广泛使用的Voxceleb1和Voxceleb2数据集上进行了大量的实验。此外,我们分析了不同损失函数监督模型得到的说话人嵌入的平均类内和类间距离。实验结果表明,DAAM损失在相等错误率(equal error rate, EER)、最小检测代价函数(minimum Detection Cost Function, minDCF)以及类间距离和类内距离比值方面优于AM损失和AAM损失。DAAM损失有效地最大化了不同说话人之间的距离,同时令同一说话人的样本保持紧凑,从而增强了说话人嵌入的区分能力。总之,本文提出了双加性角裕度边际损失(DAAM loss)。DAAM损失考虑了类内和类间关系,增强了类可分性,提高了验证性能,使说话人验证模型能够学习到更有鉴别性和鲁棒性的说话人表示。在Voxceleb1和Voxceleb2数据集上使用Xvector和Evector模型进行了广泛的实验,验证了DAAM损失的有效性,展示了其在推进说话人验证领域最先进技术方面的潜力。提出的DAAM损失有助于开发更准确和鲁棒的说话人验证系统,为该领域的未来研究铺平了道路。

     

    Abstract: ‍ ‍Speaker verification is a critical task in the field of deep learning aimed at authenticating the identity of a speaker based on their voice. Additive Margin (AM) loss and Angular Additive Margin (AAM) loss are commonly used loss functions in speaker verification. These two methods greatly improve the degree of aggregation of samples from the same class by introducing a margin term. However, these methods only consider intra-class distance but neglect the relationships between different classes, which limit their ability to discriminate between different speakers. To address this limitation, we propose a novel loss function called the Double Additive Angular Margin (DAAM) loss. The DAAM loss introduces margin constants to both intra-class and inter-class angles, effectively considering both intra-class cohesion and inter-class separation. By simultaneously compressing intra-class distances and expanding inter-class distances, the DAAM loss enhances the discriminative power of the speaker embeddings, improving the verification performance. To validate the effectiveness of the DAAM loss, we employ speaker verification models Xvector and Extended-Xvector (Evector) with different loss supervision to extract speaker embedding information and conduct extensive experiments on the widely used Voxceleb1 and Voxceleb2 datasets. Furthermore, we analyze the average intra-class and inter-class distances of the speaker embeddings obtained by a model supervised by different loss functions. The experimental results demonstrate that the proposed DAAM loss outperforms both AM and AAM losses in equal error rate (EER) and Minimum Detection Cost Function (minDCF) and inter-class distance and intra-class distance ratios. The DAAM loss effectively maximizes the distance between different speakers while maintaining compact clusters for the same speaker, thereby enhancing the discriminative power of the speaker embeddings. In summary, this paper introduces the DAAM loss. DAAM loss considers intra-class and inter-class relationships, enhances class separability, improves validation performance, and enables speaker validation models to learn more discriminative and robust speaker representations. Extensive experiments on Voxceleb1 and Voxceleb2 datasets using Xvector and Evector models validate the effectiveness of the DAAM loss, showcasing its potential to advance speaker verification. The proposed DAAM loss contributes to the development of more accurate and robust speaker authentication systems, thus paving the way for future research in this field.

     

/

返回文章
返回