Weakly labeled sound event detection based on improved pooling layer

LIU Miao; WANG Jing; DONG Guiguan; YI Weiming

doi:10.16798/j.issn.1003-0530.2021.10.014

LIU Miao, WANG Jing, DONG Guiguan, YI Weiming. Weakly labeled sound event detection based on improved pooling layer[J]. JOURNAL OF SIGNAL PROCESSING, 2021, 37(10): 1907-1913. DOI: 10.16798/j.issn.1003-0530.2021.10.014

Citation:

Weakly labeled sound event detection based on improved pooling layer

Graphical Abstract

Abstract

Abstract

For the large scale weakly labeled data set provided by the Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 Challenge Task 4, we built a multi-class sound event detection system based on the mel filter bank features (Fbank), convolutional neural networks (CNN), and recurrent neural networks (RNN). In this paper, we analyzed the partial deduction process of two existing common pooling layers, attention and linear softmax, in neural network back propagation. On the basis of linear softmax pooling layer, "exponential learnable power function softmax" pooling layer was proposed. Our experimental results show that, compared to the first-placed model in the DCASE competition, the sound event detection system applying the proposed "exponential learnable power function softmax" pooling function increases the clip level F1 value of sound event prediction from 0.556 to 0.652, the frame level F1 value from 0.555 to 0.583 and reduces the frame level error rate (ER) from 0.660 to 0.667.

FullText(HTML)

References (18)

Supplements (0)

Cited By

Weakly labeled sound event detection based on improved pooling layer

Abstract

Catalog

Export File

Citation

Format

Content