Abstract:
With increasing devices supporting the recording of binaural audios, binaural audio processing methods become a field of possible exploration in acoustic scene classification (ASC). Therefore, we would like to investigate the primary ambient extraction (PAE), a binaural audio processing method which decomposes a binaural audio sample into four channels using the phase information. Features carrying binaural phase information were therefore extracted. An ensemble of convolution neural networks (CNNs) was adopted as the classifier. Compared to existing works, the ASC system proposed in this paper can generate features with additional phase information and make full use of the advantages of binaural audios. The evaluation results validate that the performance of our ASC system can be improved by taking the binaural phase information into account. Our ASC system outperforms the baseline system provide by the 2019 IEEE AASP Challenge Detection and Classification of Acoustic Scenes and Events (DCASE) by 18.3% in terms of the classification accuracy.