ZHANG Xiyu, CHEN Xianhong. Double additive angular margin loss for speaker verification[J].Journal of Signal Processing, 2025, 41(1): 174-182. DOI: 10.12466/xhcl.2025.01.014.
Citation: ZHANG Xiyu, CHEN Xianhong. Double additive angular margin loss for speaker verification[J].Journal of Signal Processing, 2025, 41(1): 174-182. DOI: 10.12466/xhcl.2025.01.014.

Double Additive Angular Margin Loss for Speaker Verification

  • ‍ ‍Speaker verification is a critical task in the field of deep learning aimed at authenticating the identity of a speaker based on their voice. Additive Margin (AM) loss and Angular Additive Margin (AAM) loss are commonly used loss functions in speaker verification. These two methods greatly improve the degree of aggregation of samples from the same class by introducing a margin term. However, these methods only consider intra-class distance but neglect the relationships between different classes, which limit their ability to discriminate between different speakers. To address this limitation, we propose a novel loss function called the Double Additive Angular Margin (DAAM) loss. The DAAM loss introduces margin constants to both intra-class and inter-class angles, effectively considering both intra-class cohesion and inter-class separation. By simultaneously compressing intra-class distances and expanding inter-class distances, the DAAM loss enhances the discriminative power of the speaker embeddings, improving the verification performance. To validate the effectiveness of the DAAM loss, we employ speaker verification models Xvector and Extended-Xvector (Evector) with different loss supervision to extract speaker embedding information and conduct extensive experiments on the widely used Voxceleb1 and Voxceleb2 datasets. Furthermore, we analyze the average intra-class and inter-class distances of the speaker embeddings obtained by a model supervised by different loss functions. The experimental results demonstrate that the proposed DAAM loss outperforms both AM and AAM losses in equal error rate (EER) and Minimum Detection Cost Function (minDCF) and inter-class distance and intra-class distance ratios. The DAAM loss effectively maximizes the distance between different speakers while maintaining compact clusters for the same speaker, thereby enhancing the discriminative power of the speaker embeddings. In summary, this paper introduces the DAAM loss. DAAM loss considers intra-class and inter-class relationships, enhances class separability, improves validation performance, and enables speaker validation models to learn more discriminative and robust speaker representations. Extensive experiments on Voxceleb1 and Voxceleb2 datasets using Xvector and Evector models validate the effectiveness of the DAAM loss, showcasing its potential to advance speaker verification. The proposed DAAM loss contributes to the development of more accurate and robust speaker authentication systems, thus paving the way for future research in this field.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return