HUANG Cheng-Wei, JIN Yun, BAO Yong-Qiang, YU Hua, ZHAO Li. Whispered Speech Emotion Recognition Embedded with Markov Networks and Multi-Scale Decision Fusion[J]. JOURNAL OF SIGNAL PROCESSING, 2013, 29(1): 98-106.
Citation: HUANG Cheng-Wei, JIN Yun, BAO Yong-Qiang, YU Hua, ZHAO Li. Whispered Speech Emotion Recognition Embedded with Markov Networks and Multi-Scale Decision Fusion[J]. JOURNAL OF SIGNAL PROCESSING, 2013, 29(1): 98-106.

Whispered Speech Emotion Recognition Embedded with Markov Networks and Multi-Scale Decision Fusion

  • In this paper we proposed a multi-scale framework in the time domain to combine the Gaussian Mixture Model and the Markov Network, and apply which to the whispered speech emotion recognition. Based on Gaussian Mixture Model, speech emotion recognition on the long and short utterances are carried out in continuous speech signals. According to the emotion dimensional model, whispered speech emotion should be continuous in the time domain. Therefore we model the context dependency in whispered speech using Markov Network. A spring model is adopted to model the high-order variance in the emotion dimensional space and fuzzy entropy is used for calculating the unary energy in the Markov Network. Experimental results show that the recognition rate of anger emotion reaches 64.3%. Compared with the normal speech the recognition of happiness is more difficult in whispered speech, while anger and sadness is relatively easy to classify. This conclusion is supported by the listening experiment carried out by Cirillo and Todt.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return