听觉注意模型的语谱图语音情感识别方法

张昕然, 查诚, 宋鹏, 陶华伟, 赵力

张昕然, 查诚, 宋鹏, 陶华伟, 赵力. 听觉注意模型的语谱图语音情感识别方法[J]. 信号处理, 2016, 32(9): 1117-1125. DOI: 10.16798/j.issn.1003-0530.2016.09.15
引用本文: 张昕然, 查诚, 宋鹏, 陶华伟, 赵力. 听觉注意模型的语谱图语音情感识别方法[J]. 信号处理, 2016, 32(9): 1117-1125. DOI: 10.16798/j.issn.1003-0530.2016.09.15
ZHANG Xin-ran, ZHA Cheng, SONG Peng, TAO Hua-wei, ZHAO Li. Spectrogram Speech Emotion Recognition Method Based on Auditory Attention Model[J]. JOURNAL OF SIGNAL PROCESSING, 2016, 32(9): 1117-1125. DOI: 10.16798/j.issn.1003-0530.2016.09.15
Citation: ZHANG Xin-ran, ZHA Cheng, SONG Peng, TAO Hua-wei, ZHAO Li. Spectrogram Speech Emotion Recognition Method Based on Auditory Attention Model[J]. JOURNAL OF SIGNAL PROCESSING, 2016, 32(9): 1117-1125. DOI: 10.16798/j.issn.1003-0530.2016.09.15

听觉注意模型的语谱图语音情感识别方法

基金项目: 国家自然科学基金项目(61273266,61375028);教育部博士点专项基金(20110092130004);山东省自然科学基金(ZR2014FQ016)
详细信息
  • 中图分类号: TN912.34

Spectrogram Speech Emotion Recognition Method Based on Auditory Attention Model

  • 摘要: 在语音情感识别技术中,由于噪声环境、说话方式和说话人特质原因,会造成实验数据库特征不匹配的情况。从语音学上分析,该问题多存在于跨数据库情感识别实验。训练的声学模型和用于测试的语句样本之间的错位,会使语音情感识别性能剧烈下降。本文据此所研究的选择性注意声学模型能有效探测变化的情感特征。同时,利用时频原子对模型进行改进,使之能提取跨语音数据库中的显著性特征用于情感识别。实验结果表明,利用文章所提方法在跨库情感样本上进行特征提取,再通过典型的分类器,识别性能提高了9个百分点,从而验证了该方法对不同数据库具有更好的鲁棒性。
    Abstract: When there exists mismatch between the trained acoustic models and the test utterances due to noise conditions, speaking styles and speaker traits, unmatched features may appear in cross-corpus. The resulting is the drastic degression in the performance of speech emotion recognition. Hence, the auditory attention model is found to be very effective for variational emotion features detection in our work. Therefore, Chirplet has been adopted to obtain salient gist features which show their relation to the expected performance in cross-corpus testing. In our experimental results, the prototypical classifier with the proposed feature extraction approach can deliver a gain of up to 9.6% accuracy in cross-corpus speech recognition, which is observed insensitive to different databases.
  • [1] Song P, Jin Y, Zha C, et al.Speech emotion recognition method based on hidden factor analysis [J]. Electronics Letters, 2014, 1(51):112-114
    [2] Zhang X, Tao H, Zha C, et al.A Robust Method for Speech Emotion Recognition Based on Infinite Student’s-Mixture Model[J].Mathematical Problems in Engineering, 2015, 2015(10):1-12
    [3] Schuller B, Zhang Z, Weninger F, et al.Synthesized speech for model training in cross-corpus recognition of human emotion[J].International Journal of Speech Technology, 2012, 15(3):313-323
    [4] Deng J, Zhang Z, Marchi E, et al.Sparse autoencoder-based feature transfer learning for speech emotion recognition[C]. IEEE, 2013.
    [5] Yun J, Peng S, Zheng W, et al.Speaker-independent speech emotion recognition based on two-layer multiple kernel learning[J].IEICE TRANSACTIONS on Information and Systems, 2013, 10(96):2286-2289
    [6] Ajmera P K, Jadhav D V, Holambe R S.Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram[J].Pattern Recognition, 2011, 10-11(44):2749-2759
    [7] Kalinli O, Narayanan S.Prominence detection using auditory attention cues and task-dependent high level information[J]. Audio, Speech, and Language Processing[J].IEEE Transactions on, 2009, 5(17):1009-1024
    [8] Wong W K, Zhao H T.Supervised optimal locality preserving projection[J].Pattern Recognition, 2012, 1(45):186-197
    [9] Yin Q, Qian S, Feng A.A fast refinement for adaptive Gaussian chirplet decomposition[J].Signal Processing IEEE Transactions on, 2002, 6(50):1298-1306
    [10] Bayram I.An analytic wavelet transform with a flexible time-frequency covering[J].Signal Processing, IEEE Transactions on., 2013, 5(61):1131-1142
    [11] Noriega G.A Neural Model to Study Sensory Abnormalities and Multisensory Effects in Autism[J].Neural Systems and Rehabilitation Engineering, IEEE Transactions on, 2015, 2(23):199-209
    [12] Khoubrouy S, Panahi I, Hansen J H.Howling Detection in Hearing Aids Based on Generalized Teager–Kaiser Operator[J].Audio, Speech, and Language Processing, IEEE/ACM Transactions on., 2015, 1(23):154-161
    [13] Ali S A, Khan A, Bashir N.Analyzing the Impact of Prosodic Feature (Pitch) on Learning Classifiers for Speech Emotion Corpus[J].International Journal of Information Technology and Computer Science (IJITCS), 2015, 2(7):54-62
    [14] Burkhardt F, Paeschke A, Rolfes M, et al.A database of German emotional speech.[C]. 2005.
    [15] Martin O, Kotsia I, Macq B, et al.The eNTERFACE'05 audio-visual emotion database[C]. IEEE, 2006.
    [16] Schuller B, Steidl S, Batliner A, et al.The INTERSPEECH 2010 paralinguistic challenge.[C]. 2010.
    [17] Eyben F, W?llmer M, Schuller B.Opensmile: the munich versatile and fast open-source audio feature extractor[C]. ACM, 2010.
    [18] Moustakidis S, Mallinis G, Koutsias N, et al.SVM-based fuzzy decision trees for classification of high spatial resolution remote sensing images[J].Geoscience and Remote Sensing, IEEE Transactions on., 2012, 1(50):149-169
    [19] Kim E H, Hyun K H, Kim S H, et al.Improved emotion recognition with a novel speaker-independent feature[J]. Mechatronics, IEEE/ASME Transactions on, 2009, 3(14):317-325
计量
  • 文章访问数:  172
  • HTML全文浏览量:  2
  • PDF下载量:  1366
  • 被引次数: 0
出版历程
  • 收稿日期:  2015-12-22
  • 修回日期:  2016-03-20
  • 发布日期:  2016-09-24

目录

    /

    返回文章
    返回