Abstract:
With the rapid development of machine learning, more and more researchers utilize neural networks to tackle multifarious issues existing in the domain of speech recognition. However, in virtue of various reasons like the restricted training data, most of conventional neural network classifiers are with the flaws such as generalization error and so on. In order to solve this problem, multi-task learning belonging to transfer learning has been studied actively nowadays. Based upon multi-task learning and cyclic neural network, this paper proposes a speech emotion recognition algorithm (MTL-RNN) which takes emotion recognition as the main task, gender and identity recognition as auxiliary tasks. On this basis, the three tasks are trained simultaneously in the neural network. Aiming at learning the sharing features and improving the classification performance of the model, the algorithm model shares network parameters through RNN sharing layers and studies unique features through the attribute-dependent layers. Experiments show that the MTL-RNN algorithm proposed in this paper has good recognition performance in the language environment of both Chinese and Arabic. Furthermore, it also works well not only in the experiment containing a few speakers but also in the other one with relatively more speakers.