基于迁移判别回归的跨域语音情感识别

宋鹏; 李绍凯; 张雯婧; 郑文明; 赵力

doi:10.16798/j.issn.1003-0530.2023.04.006

基于迁移判别回归的跨域语音情感识别

Transfer Discriminant Regression for Cross-domain Speech Emotion Recognition

摘要

摘要: 针对实际情况下训练和测试数据来自不同领域数据库导致识别性能下降的问题，提出了一种基于迁移判别回归的跨域语音情感识别方法。首先，引入最大均值差异和图拉普拉斯项作为域间联合距离度量，在减小概率分布差异的同时，很好地保留数据的局部几何结构，从而学习到一个可迁移的公共特征表示。其次，本文采用一种能量保持策略，以避免迁移过程中目标域信息的丢失。此外，通过引入判别回归项，利用已标记的源域样本在公共子空间中训练一个可迁移的判别回归模型。最后，为了使学习到的模型具有特征选择能力和鲁棒性，分别对投影矩阵和回归项施加 $L_{2,1}$ 范数约束。在3个公开数据集上的实验结果表明，本文提出的算法相较于其他几种迁移学习方法具有更好的识别性能。

Abstract: ‍ ‍To solve the problem that the training and testing data come from different domain databases in actual situation， which leads to the decline of recognition performance， we proposed a transfer discriminant regression method for cross-domain speech emotion recognition. Specifically， first， we employed maximum mean discrepancy （MMD） and graph Laplacian as the distance measurement between domains to reduce the distribution difference while preserving the local geometrical structure. Thus， we can learn a transferable common feature representation. To ensure that the information of target corpus is not lost in the process of knowledge transfer， an energy conservation strategy was proposed. Second， we trained a transferable regression model by using labeled source domain samples in the common subspace. We imposed an $L_{2,1}$ -norm constraint on the common projection matrix and regression term， which can make the model be more robust. The experimental results on three public datasets show that the proposed approach outperforms the other transfer learning methods.

HTML全文

参考文献(29)

施引文献

资源附件(0)