YUAN Shuang, YANG Lidong, GUO Yong, NIU Dawei, ZHANG Dandan. Audio Scene Classification Based on Audio Spectrogram Transformer[J]. JOURNAL OF SIGNAL PROCESSING, 2023, 39(4): 730-736. DOI: 10.16798/j.issn.1003-0530.2023.04.014
Citation: YUAN Shuang, YANG Lidong, GUO Yong, NIU Dawei, ZHANG Dandan. Audio Scene Classification Based on Audio Spectrogram Transformer[J]. JOURNAL OF SIGNAL PROCESSING, 2023, 39(4): 730-736. DOI: 10.16798/j.issn.1003-0530.2023.04.014

Audio Scene Classification Based on Audio Spectrogram Transformer

  • ‍ ‍Audio scene classification was an important part of scene understanding. Learning the characteristics of audio scenes and accurate classification can strengthen the interaction between machines and the environment, and its importance is self-evident in the age of big data. In view of the fact that the performance of classification task depends on the size of the dataset, but the actual task is faced with a serious shortage of data sets, this paper proposed a data enhancement and network model pre-training strategy, which combined the audio spectrogram transformer model with the audio scene classification task. First, extracted the input model of the log-Mel energies spectrum of the audio signal, then strengthened the spatial relationship of the audio sequence through the dynamic interaction ability of the model, and finally complete the classification by the tag vector. The method in this paper is tested on the public datasets of DCASE2019task1 and DCASE2020task1, and the classification accuracy rates are 96.489% and 93.227% respectively, which is significantly improved compared with the existing algorithms, indicating that this method is applicable to high-precision audio scene classification tasks, laying a foundation for high-precision intelligent devices to perceive environmental content and detect environmental dynamics.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return