Abstract:
In cross-corpus speech emotion recognition, the feature fusion on multi-scale is the current technical difficulties. Based on the Deep Belief Nets (DBN) in the field of Deep Learning, a method based on feature level fusion for the cross-corpus SER is proposed. According to the foregoing feature abstraction research, the emotional traits hiding in speech spectrum diagram (spectrogram) are obtained as image features, which are implemented feature fusion with the traditional emotion features. In cross-corpus speech emotion recognition, the feature fusion on multi-scale is the current technical difficulties. First based on the spectrogram analysis by STB/Itti model, the new spectrogram features are extracted from the color, the brightness and the direction, respectively; Then use modified DBNs fuse the traditional and the spectrogram features, which increase the scale of the feature subset and the characterization ability of emotion. Through the experiment on ABC database and Chinese corpus, the new feature subset is compared with traditional speech emotion features, while the recognition result on cross-corpus gains a obvious advances.