多优化机制下深度神经网络的音频场景识别

Audio scene recognition of deep neural network under multiple optimization mechanisms

  • 摘要: 随着并行计算能力的不断攀升和音频数据量的日益扩增,音频场景识别成为场景理解领域重要的研究内容之一。针对音频场景识别建模难度大和识别准确率不高的问题,本文提出了融合多优化机制的并行卷积循环神经网络算法模型。首先,将音频信号经预处理后转化为一定尺寸的梅尔声谱图,之后输入到网络模型中进行充分的空间特征和时间特征学习,最后进行识别。为了验证模型的有效性,在DCASE2019音频场景数据集上进行识别性能测试,结果显示,该算法模型对音频场景的识别准确率能够达到88.84%,优于传统网络模型,说明该算法模型对音频场景识别问题的有效性。

     

    Abstract: With the increasing parallel computing power and the increasing amount of audio data, audio scene recognition has become one of the important research contents in the field of scene understanding. In order to solve the problems of difficult modeling and low accuracy of audio scene recognition, a Paralleling Convolutional Recurrent Neural Network algorithm model with multi-optimization mechanism is proposed in this paper. First of all, the audio signal is preprocessed and converted into a Mel spectrogram of a certain size, and then input into the network model for full spatial and temporal feature learning, and finally recognition. In order to verify the effectiveness of the model, the recognition performance test is carried out on the DCASE2019 audio scene data set. The results show that the accuracy of the algorithm model for audio scene recognition can reach 88.84%, which is better than the traditional network model, indicating the effectiveness of the algorithm model for audio scene recognition.

     

/

返回文章
返回