采用多任务学习和循环神经网络的语音情感识别算法

Speech emotion recognition algorithm based on multi-task learning and recurrent neural network

  • 摘要: 随着机器学习的快速发展,许多研究者使用神经网络来解决语音识别领域中的各类问题。然而由于训练数据有限等原因,常规的神经网络分类器普遍存在泛化误差等问题。为了解决此问题,迁移学习中的多任务学习被引入到研究中。本文提出了一种采用多任务学习和循环神经网络的语音情感识别算法(MTL-RNN),将说话人情感识别作为主任务,性别识别和身份识别作为辅助任务,三个任务在神经网络中并行训练。算法模型通过RNN共享层共享网络参数、学习共享特征,通过属性依赖层学习独有特征,以提升模型的分类性能。实验结果表明,本文所提出的MTL-RNN算法在汉语和阿拉伯语、较少说话人和较多说话人的场景下均有较好的识别性能。

     

    Abstract: With the rapid development of machine learning, more and more researchers utilize neural networks to tackle multifarious issues existing in the domain of speech recognition. However, in virtue of various reasons like the restricted training data, most of conventional neural network classifiers are with the flaws such as generalization error and so on. In order to solve this problem, multi-task learning belonging to transfer learning has been studied actively nowadays. Based upon multi-task learning and cyclic neural network, this paper proposes a speech emotion recognition algorithm (MTL-RNN) which takes emotion recognition as the main task, gender and identity recognition as auxiliary tasks. On this basis, the three tasks are trained simultaneously in the neural network. Aiming at learning the sharing features and improving the classification performance of the model, the algorithm model shares network parameters through RNN sharing layers and studies unique features through the attribute-dependent layers. Experiments show that the MTL-RNN algorithm proposed in this paper has good recognition performance in the language environment of both Chinese and Arabic. Furthermore, it also works well not only in the experiment containing a few speakers but also in the other one with relatively more speakers.

     

/

返回文章
返回