基于高效通道注意力机制的语音情感识别方法

Speech emotion recognition based on Effificient Channel Attention

  • 摘要: 传统语音处理方式是把语音样本分割成固定长度的片段,但这种语音样本的切割会导致语音情感分类准确性下降。本文引入循环填充法处理可变长度的log-Mel谱图,该方法能够更好的利用时间动态信息,同时可以减少填充的无效数据对模型参数学习的干扰。由于人类的情感只能在语音中某些特定的时刻出现,为了寻找关键情感特征,本文构建了基于高效通道注意力机制的语音情感识别模型,其中高效通道注意力机制能够计算通道图的重要性,有选择的强调通道图,改进特定情感的表达。本文在交互式情感二元动作捕捉(IEMOCAP)数据库上进行相关实验,在IEMOCAP上采用循环填充法的加权精度(WA)和非加权精度(UA)分别达到73.2%和70.9%,采用本文提出模型的WA和UA分别达到76.0%和73.4%。

     

    Abstract: Traditional speech processing method is to segment speech samples into fixed length segments, but this kind of speech sample cutting will lead to the accuracy of speech emotion classification decline.In this paper, the cyclic filling method is introduced to process variable-length log-Mel spectrograms. The cyclic filling method can make better use of the dynamic information of time, reduce the interference of invalid filled data on the model parameter learning. Since human emotion can only appear in certain specific moments in speech, in order to the key emotion features, this paper constructs a speech emotion recognition model based on an efficient channel attention mechanism. The efficient channel attention mechanism can calculate the importance of channel graphs, selectively emphasize channel graphs and improve the expression of specific emotions. In this paper,?relevant experiments were carried out on Interactive Emotional Motion Capture (IEMOCAP) corpus.The weighted accuracy (WA) and unweighted accuracy (UA) of the cyclic filling method reaches 73.2% and 70.9% respectively , the WA and UA of the proposed model reaches76.0% and 73.4% respectively on IEMOCAP.

     

/

返回文章
返回