CAO Ronghe, WU Xiaolong, FENG Chang, ZHENG Fang, XU Mingxing, . Wav2vec2.0 and Context Emotional Information Compensation Based Dialogue Speech Emotion Recognition[J]. JOURNAL OF SIGNAL PROCESSING, 2023, 39(4): 698-707. DOI: 10.16798/j.issn.1003-0530.2023.04.011
Citation: CAO Ronghe, WU Xiaolong, FENG Chang, ZHENG Fang, XU Mingxing, . Wav2vec2.0 and Context Emotional Information Compensation Based Dialogue Speech Emotion Recognition[J]. JOURNAL OF SIGNAL PROCESSING, 2023, 39(4): 698-707. DOI: 10.16798/j.issn.1003-0530.2023.04.011

Wav2vec2.0 and Context Emotional Information Compensation Based Dialogue Speech Emotion Recognition

  • ‍ ‍Emotions play an important role in human interaction. In the sentences of daily dialogues, there exists phenomena like weak emotional feelings, complex emotional categories and high ambiguity, which makes dialogue speech emotion recognition a challenging task. In order to solve this problem, existing works use global emotional information for prediction by retrieving emotional information from the global dialogue. However, the indiscriminate use of preceding emotional information can interfere with prediction of the current one when the emotional changes between the preceding and subsequent utterances are large. This paper proposes a method based on Wav2vec2.0 and contextual emotional information compensation, aiming to select the most relevant emotional information from the preceding utterances as compensation. Firstly, through the contextual information compensation module, the prosodic information of importance to the current utterance in discourse is selected from the preceding, which is used to construct contextual emotion information compensation representation through the long-term and short-term memory network (LSTM). Then the embedded representation of the current utterance is extracted through Wav2vec2.0, concatenated with the contextual representation above to form a new emotional representation. The recognition performance of our method on the IEMOCAP dataset is 69.0% (WA), significantly outperforming the baseline model.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return