LI Hongwei, MA Lin, LI Haifeng. Speech Emotion Recognition Through Kernel Canonical Correlation Analysis[J]. JOURNAL OF SIGNAL PROCESSING, 2023, 39(4): 639-648. DOI: 10.16798/j.issn.1003-0530.2023.04.005
Citation: LI Hongwei, MA Lin, LI Haifeng. Speech Emotion Recognition Through Kernel Canonical Correlation Analysis[J]. JOURNAL OF SIGNAL PROCESSING, 2023, 39(4): 639-648. DOI: 10.16798/j.issn.1003-0530.2023.04.005

Speech Emotion Recognition Through Kernel Canonical Correlation Analysis

  • ‍ ‍Speech is the most important tool for human to express thoughts and emotion, as well as it is an important part of human culture. Speech affective recognition (SAR), as an important task in affective computing, has become an international research hotspot and attracts more and more attention. Neuroscience research has shown that the brain is the material basis for producing and regulating emotions. Therefore, in the study of speech emotion, we should not only consider the speech signal itself, but also integrate the activity signal of the brain into speech affective recognition to achieve a higher accuracy rate. Based on the above ideas, a speech feature projection method based on Kernel Canonical Correlation Analysis (KCCA) was proposed in this paper. This method maps speech and electroencephalogram (EEG) features to a high-dimensional Hilbert space and calculates the maximum correlation coefficient between them. KCCA projects the speech features in the direction with the greatest correlation with the EEG features, and finally obtains the speech features containing EEG information. This method incorporates EEG information related to speech emotion into speech emotion feature extraction, which can more accurately represent emotion. At the same time, this method has good migration in theory. When the proposed EEG features are sufficiently accurate and representative, the projection vectors obtained from KCCA modeling are generalizable and can be used directly in new speech emotion datasets without reacquisition and computation of the responding EEG. The experimental results on MSP-IMPROV and self-built speech emotion datasets show that the projected speech feature is better than the original speech feature and other speech feature extraction methods.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return