Emotion recognition from Raw Speech based on Sinc-Transformer model

YU Jiajia; JIN Yun; MA Yong; JIANG Fangjiao; DAI Yanyan

doi:10.16798/j.issn.1003-0530.2021.10.011

YU Jiajia, JIN Yun, MA Yong, JIANG Fangjiao, DAI Yanyan. Emotion recognition from Raw Speech based on Sinc-Transformer model[J]. JOURNAL OF SIGNAL PROCESSING, 2021, 37(10): 1880-1888. DOI: 10.16798/j.issn.1003-0530.2021.10.011

Citation:

Emotion recognition from Raw Speech based on Sinc-Transformer model

Graphical Abstract

Abstract

Abstract

Considering the complexity of manual extraction of acoustic features in traditional speech emotion recognition tasks, this paper proposed the Sinc-Transformer (SincNet Transformer) model for speech emotion recognition using raw speech. This model combined the advantages of SincNet and Transformer model encoder, and used SincNet filter to capture important narrow-band emotional features from the raw speech waveform, so that the whole network structure could be instructive in the process of feature extraction, so as to completed the shallow feature extraction work of raw speech signals；and used two layers of Transformer model encoders for secondary processing to extract deeper feature vectors that contain global context information. Among the four categories of speech emotion recognition in IEMOCAP database, experimental results show that the accuracy and unweighted average recall of Sinc-Transformer model proposed in this paper are 64.14% and 65.28% respectively. Meanwhile, compared with the baseline model, the proposed model can effectively improve speech emotion recognition performance.

FullText(HTML)

References (21)

Supplements (0)

Cited By

Emotion recognition from Raw Speech based on Sinc-Transformer model

Abstract

Catalog

Export File

Citation

Format

Content