保留立体声相位信息的声音场景分类系统

ACOUSTIC SCENE CLASSIFICATION SYSTEM USING BINAURAL PHASE INFORMATION

  • 摘要: 针对立体声音频采集设备逐渐普及的趋势,本文提出了一种保留立体声相位信息的声音场景分类算法。在预处理阶段,根据左右通道的相位信息对音频样本进行源环境提取,生成一种全新的四通道特征。在此基础上,通过集成多个卷积神经网络,搭建一个针对立体声音频样本的声音场景分类系统。区别于现有声音场景分类系统只使用时频谱幅度信息,本文所提出的方法保留了立体声音频的相位信息。这使得声学特征中所包含的空间方位信息更丰富,立体声音频的优势得到发挥。实验结果证明保留立体声相位信息的声音场景分类系统具有更好的性能,在2019年IEEE声学信号处理技术委员会举办的声音场景分类赛事中相比于基线系统的识别准确率提升了18.3%。

     

    Abstract: With increasing devices supporting the recording of binaural audios, binaural audio processing methods become a field of possible exploration in acoustic scene classification (ASC). Therefore, we would like to investigate the primary ambient extraction (PAE), a binaural audio processing method which decomposes a binaural audio sample into four channels using the phase information. Features carrying binaural phase information were therefore extracted. An ensemble of convolution neural networks (CNNs) was adopted as the classifier. Compared to existing works, the ASC system proposed in this paper can generate features with additional phase information and make full use of the advantages of binaural audios. The evaluation results validate that the performance of our ASC system can be improved by taking the binaural phase information into account. Our ASC system outperforms the baseline system provide by the 2019 IEEE AASP Challenge Detection and Classification of Acoustic Scenes and Events (DCASE) by 18.3% in terms of the classification accuracy.

     

/

返回文章
返回