Audio Scene Classification Based on Audio Spectrogram Transformer
-
-
Abstract
Audio scene classification was an important part of scene understanding. Learning the characteristics of audio scenes and accurate classification can strengthen the interaction between machines and the environment, and its importance is self-evident in the age of big data. In view of the fact that the performance of classification task depends on the size of the dataset, but the actual task is faced with a serious shortage of data sets, this paper proposed a data enhancement and network model pre-training strategy, which combined the audio spectrogram transformer model with the audio scene classification task. First, extracted the input model of the log-Mel energies spectrum of the audio signal, then strengthened the spatial relationship of the audio sequence through the dynamic interaction ability of the model, and finally complete the classification by the tag vector. The method in this paper is tested on the public datasets of DCASE2019task1 and DCASE2020task1, and the classification accuracy rates are 96.489% and 93.227% respectively, which is significantly improved compared with the existing algorithms, indicating that this method is applicable to high-precision audio scene classification tasks, laying a foundation for high-precision intelligent devices to perceive environmental content and detect environmental dynamics.
-
-