LU Huaqing, GE Zirui, WANG Tianlang, et al. Speech anti-spoofing method based on graph attention mechanism and adversarial training[J]. Journal of Signal Processing, 2025, 41(1): 161-173. DOI: 10.12466/xhcl.2025.01.013.
Citation: LU Huaqing, GE Zirui, WANG Tianlang, et al. Speech anti-spoofing method based on graph attention mechanism and adversarial training[J]. Journal of Signal Processing, 2025, 41(1): 161-173. DOI: 10.12466/xhcl.2025.01.013.

Speech Anti-Spoofing Method Based on Graph Attention Mechanism and Adversarial Training

  • ‍ ‍Speech anti-spoofing seeks to bolster the security of speech systems by crafting network architectures and employing learning algorithms to effectively distinguish between genuine and fake speech. This paper presents a speech anti-spoofing method that integrates graph attention mechanisms and adversarial training to tackle the challenges of speech anti-spoofing. Specifically, the proposed method is based on the speaker attractor multi-center one-class (SAMO) learning algorithm using graph signal processing (GSP) theory. First, a speaker feature representation graph is constructed using GSP theory, in which each node corresponds to a feature representation of the speaker. A graph attention network (GAT) is then introduced to extract speaker attractor centers. By introducing an attention mechanism to consolidate speaker feature representations, a more representative speaker attractor center is obtained through aggregation calculation, thereby improving the capacity of the system to discriminate between genuine and fake speech. Furthermore, this paper acknowledges the potential limitations of learning specific features of fake artifacts solely based on recognized fake types. Specifically, it may restrict the effectiveness of the network in practical scenarios that involve handling unknown types of spoofing attacks. As a solution, a novel approach is proposed to enhance the anti-spoofing network by integrating an adversarial fake type classification network. This unique framework enables the network to simultaneously learn feature representations for both speech authenticity classification and fake type classification tasks. By utilizing the gradient reversal layer (GRL) in adversarial training between the fake type classification assistance network and the feature representation learning module, the network is prevented from accurately distinguishing between different types of fake speech. This prompts the network to learn common fake artifact features that are shared across different types of fake speech. Consequently, the speech authenticity classification task becomes more adaptable to unknown inputs, enabling the network to recognize the artifact features of unknown fake speech types and enhancing the efficiency of the system in detecting unknown fake speech in real tests. To evaluate the effectiveness of the proposed method of combining GAT and adversarial training, experiments were conducted on popular datasets, namely ASVspoof 2019 LA, CFAD, and ASVspoof 2021 LA. The experimental results, evaluated using common anti-spoofing metrics, demonstrate that the proposed method outperforms both the SAMO baseline system and other advanced comparative systems. Additionally, visualization techniques, including t-distributed stochastic neighbor embedding (t-SNE) and similarity matrix heat map, are employed to provide a visual representation of the system performance. The t-SNE visualization provides visual representations that show the distinct clustering of genuine and fake speech samples, highlighting the discernment of the proposed method. It visualizes the advantages of the proposed method in accurately distinguishing genuine speeches from fake ones. The similarity matrix heat map, on the other hand, visually represents the degree of similarity between different types of fake speech feature representations using different color shades. When the results obtained from different systems are compared, it is evident that the proposed system excels in learning common features of fake artifacts. Thus, it is demonstrated that the proposed system effectively leverages adversarial training to enhance its ability to learn common fake artifact features.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return