基于改进池化层的弱标记声音事件检测

Weakly labeled sound event detection based on improved pooling layer

  • 摘要: 针对DCASE2017挑战赛任务4提供的大规模弱标记声音事件检测数据集,搭建了基于梅尔滤波器特征(Fbank)、卷积神经网络(CNN)以及循环神经网络(RNN)的多类别声音事件检测系统,分析了attention和linear softmax两种已有的常用池化层在神经网络反向传播中的部分推演过程,并在linear softmax池化层的基础上进行改进,提出了一种“指数可学习的幂函数softmax”池化层。实验结果表明,相比于DCASE竞赛中获得第一名的模型,应用“指数可学习的幂函 softmax”池化层的检测系统,将段级别的声音事件预测的F1值从0.556提高到0.652,帧级别预测的F1值从0.518提高到0.583,帧级别预测的error rate (ER) 从0.730降低到0.667。

     

    Abstract: For the large scale weakly labeled data set provided by the Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 Challenge Task 4, we built a multi-class sound event detection system based on the mel filter bank features (Fbank), convolutional neural networks (CNN), and recurrent neural networks (RNN). In this paper, we analyzed the partial deduction process of two existing common pooling layers, attention and linear softmax, in neural network back propagation. On the basis of linear softmax pooling layer, "exponential learnable power function softmax" pooling layer was proposed. Our experimental results show that, compared to the first-placed model in the DCASE competition, the sound event detection system applying the proposed "exponential learnable power function softmax" pooling function increases the clip level F1 value of sound event prediction from 0.556 to 0.652, the frame level F1 value from 0.555 to 0.583 and reduces the frame level error rate (ER) from 0.660 to 0.667.

     

/

返回文章
返回