SUN Congshan, MA Lin, LI Haifeng. Speech Emotion Recognition Based on CM-OMEMD and Wavelet Scattering Network[J]. JOURNAL OF SIGNAL PROCESSING, 2023, 39(4): 688-697. DOI: 10.16798/j.issn.1003-0530.2023.04.010
Citation: SUN Congshan, MA Lin, LI Haifeng. Speech Emotion Recognition Based on CM-OMEMD and Wavelet Scattering Network[J]. JOURNAL OF SIGNAL PROCESSING, 2023, 39(4): 688-697. DOI: 10.16798/j.issn.1003-0530.2023.04.010

Speech Emotion Recognition Based on CM-OMEMD and Wavelet Scattering Network

  • ‍ ‍Speech emotion recognition (SER) is an essential part of human-computer interaction and has a wide range of research and application values. There are still problems in current SER, such as low accuracy of speech emotion recognition due to the lack of a large-scale speech emotion dataset and low robustness of speech emotion features. To address these problems, a SER method based on an improved empirical mode decomposition (EMD) and wavelet scattering network (WSN) was proposed. First, a novel optimized masking EMD based on constant-Q transform (CQT) and marine predator algorithm (MPA), named CM-OMEMD, was proposed to address the mode mixing and noise residual problems in EMD and its improved algorithms for time-frequency analysis of speech emotional signals. The CM-OMEMD was used to decompose the emotional speech signal to obtain intrinsic mode functions (IMFs). The time-frequency features that can characterize the emotion were extracted from the IMFs as the first feature set. Then the scattering coefficients with translational invariance and deformation stability were extracted using WSN as the second feature set. Finally, the two feature sets were fused, and a support vector machine (SVM) classifier was used for classification. The effectiveness of the proposed method was demonstrated by comparison experiments on the TESS dataset containing seven emotions. The CM-OMEMD reduced the mode mixing and improved the accuracy of time-frequency analysis of emotional speech signals, while the proposed SER method significantly improved the performance of SER.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return