用于跨库语音情感识别的DBN特征融合方法

Feature Fusion Based on DBN for Cross-corpus speech emotion recognition

  • 摘要: 跨数据库语音情感识别中,将不同尺度上提取的情感特征进行融合是目前的技术难点。本文利用深度学习领域的深度信念模型,提出了基于深度信念网络的特征层融合方法。将语音频谱图中隐含的情感信息作为图像特征,与传统情感特征融合。研究解决了跨数据库语音情感识别中,将不同尺度上提取的情感特征进行融合的技术难点。利用STB/Itti模型对语谱图进行分析,从颜色、亮度、方向三个角度出发,提取了新的语谱图特征;然后研究改进的DBN网络模型并对传统声学特征与新提取的语谱图特征进行了特征层融合,增强了特征子集的尺度,提升了情感表征能力。通过在ABC数据库和多个中文数据库上的实验验证,特征融合后的新特征子集相比传统的语音情感特征,其跨数据库识别结果获得了明显提升。

     

    Abstract: In cross-corpus speech emotion recognition, the feature fusion on multi-scale is the current technical difficulties. Based on the Deep Belief Nets (DBN) in the field of Deep Learning, a method based on feature level fusion for the cross-corpus SER is proposed. According to the foregoing feature abstraction research, the emotional traits hiding in speech spectrum diagram (spectrogram) are obtained as image features, which are implemented feature fusion with the traditional emotion features. In cross-corpus speech emotion recognition, the feature fusion on multi-scale is the current technical difficulties. First based on the spectrogram analysis by STB/Itti model, the new spectrogram features are extracted from the color, the brightness and the direction, respectively; Then use modified DBNs fuse the traditional and the spectrogram features, which increase the scale of the feature subset and the characterization ability of emotion. Through the experiment on ABC database and Chinese corpus, the new feature subset is compared with traditional speech emotion features, while the recognition result on cross-corpus gains a obvious advances.

     

/

返回文章
返回