JI Xunsheng, JIANG Kun, XIE Jie. Deep Feature Fusion of Multi-Dimensional Neural Network for Bird Call Recognition[J]. JOURNAL OF SIGNAL PROCESSING, 2022, 38(4): 844-853. DOI: 10.16798/j.issn.1003-0530.2022.04.019
Citation: JI Xunsheng, JIANG Kun, XIE Jie. Deep Feature Fusion of Multi-Dimensional Neural Network for Bird Call Recognition[J]. JOURNAL OF SIGNAL PROCESSING, 2022, 38(4): 844-853. DOI: 10.16798/j.issn.1003-0530.2022.04.019

Deep Feature Fusion of Multi-Dimensional Neural Network for Bird Call Recognition

  • In order to improve the accuracy of bird sound monitoring during night migration, this paper proposed a deep feature fusion system of multi-dimensional neural network for bird sound classification. Firstly, we proposed an improved VGG Style model, which used log-scaled Mel spectrogram as training feature to enhance the energy distribution of spectrogram, and generate virtual data by mix up to reduce model over-fitting. Then, the pre-trained VGG Style was used to generate deep features for each bird sound. In view of the complementarity of different dimensional models, 1D CNN-LSTM, 2D VGG Style and 3D DenseNet121 were employed as feature extractors to generate advanced features. For 1D CNN-LSTM, in order to obtain richer time-frequency information, the wavelet decomposition was used as pooling method to extract multi-level LBP features from time domain and frequency domain respectively as training input. Meanwhile, the fully connected layer of CNN-LSTM and DenseNet121 were optimized to reduce model parameters and improve real-time performance. Finally, the deep features of three models were fused and fed to K-nearest neighbor for classification, which got the balanced-accuracy of 93.89% for a public dataset CLO-43SD of 5428 flight calls spanning 43 species and exceeded the latest fusion of Mel-VGG and Subnet-CNN by 7.58%.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return