基于多维神经网络深度特征融合的鸟鸣识别算法

吉训生; 江昆; 谢捷

doi:10.16798/j.issn.1003-0530.2022.04.019

基于多维神经网络深度特征融合的鸟鸣识别算法

Deep Feature Fusion of Multi-Dimensional Neural Network for Bird Call Recognition

摘要

摘要: 为了进一步提高夜间迁徙鸟鸣监测的准确率，提出一种基于多维神经网络深度特征融合的鸟鸣识别算法。首先，提取鸟鸣对数尺度的梅尔谱图作为VGG Style模型的训练特征，增强时频谱图的能量分布，通过Mix up数据混合生成虚拟数据以减少模型的过拟合。之后，将预训练的VGG Style作为特征提取器对每一段鸟鸣提取深度特征。鉴于不同维度模型的互补性，该文提出分别使用1维CNN-LSTM、2维VGG Style与3维DenseNet121模型作为特征提取器生成高级特征。对于1维CNN-LSTM，使用小波分解作为池化方法，分别对鸟鸣时、频域进行9层小波分解，生成多层LBP特征以获取更丰富的时频信息。最后，对CNN-LSTM与DenseNet121的全连接层进行优化，减少模型参数，提高实时性。实验结果表明，通过融合多维神经网络的深度特征，使用浅层分类器在含有43种鸟类的CLO-43SD数据集中，获得了93.89%的平衡准确率，相较于最新的Mel-VGG与Subnet-CNN融合模型，平衡准确率提高了7.58%。

Abstract: In order to improve the accuracy of bird sound monitoring during night migration， this paper proposed a deep feature fusion system of multi-dimensional neural network for bird sound classification. Firstly， we proposed an improved VGG Style model， which used log-scaled Mel spectrogram as training feature to enhance the energy distribution of spectrogram， and generate virtual data by mix up to reduce model over-fitting. Then， the pre-trained VGG Style was used to generate deep features for each bird sound. In view of the complementarity of different dimensional models， 1D CNN-LSTM， 2D VGG Style and 3D DenseNet121 were employed as feature extractors to generate advanced features. For 1D CNN-LSTM， in order to obtain richer time-frequency information， the wavelet decomposition was used as pooling method to extract multi-level LBP features from time domain and frequency domain respectively as training input. Meanwhile， the fully connected layer of CNN-LSTM and DenseNet121 were optimized to reduce model parameters and improve real-time performance. Finally， the deep features of three models were fused and fed to K-nearest neighbor for classification， which got the balanced-accuracy of 93.89% for a public dataset CLO-43SD of 5428 flight calls spanning 43 species and exceeded the latest fusion of Mel-VGG and Subnet-CNN by 7.58%.

HTML全文

参考文献(26)

施引文献

资源附件(0)