卷积神经网络在异常声音识别中的研究

Research on Abnormal Audio Event Detection Based on Convolutional Neural Networks

  • 摘要: 卷积神经网络(CNNs)已广泛应用于语音识别领域中以改善传统声学模型存在的鲁棒性弱、实时性差、识别性能低等缺点。本文对卷积神经网络在异常声音识别任务中的适用性及其识别性能进行了研究,针对日常常见的6种不同异常声音样本,分析了不同声音特征的维度对卷积神经网络识别性能的的影响,还将卷积神经网络分别与高斯混合模型、BP神经网络进行比较。实验结果表明,无噪声条件下,一维特征在卷积神经网络中的平均识别率比二维特征相对提升了2.91%,且误差收敛速度更快,但在有噪声条件下,二维特征的平均识别率比一维特征相对提升了3.41%。同时卷积神经网络比其它两种识别模型在对噪声的鲁棒性和误差收敛速度等方面均有明显的优势。

     

    Abstract: Convolution neural networks (CNNs) have been widely used in the field of speech recognition to make up the deficiency of traditional acoustic models, such as weak robustness, poor real-time and low recognition performance. In this paper, the applicability and recognition performance of abnormal sound recognition based on CNNs were analyzed. Applied on 6 common abnormal sounds, we explored the dimension of sound signal features how to influence the performance of CNNs architecture, as reference methods, the Gaussian Mixture Model (GMM) and Back Propagation neural networks were employed to compare with CNNs algorithm. The experimental results reveled that 1D features produce higher error convergence rate and average accuracies with the relative increase of 2.91% in the noiseless environment. Nevertheless in noisy context, 2D features perform better, the relative increase reaches 3.41%. Meanwhile, CNNs method has distinct advantage in the terms of noise robustness and error convergence speed over other two approaches.

     

/

返回文章
返回