Abstract:
Traditional speech processing method is to segment speech samples into fixed length segments, but this kind of speech sample cutting will lead to the accuracy of speech emotion classification decline.In this paper, the cyclic filling method is introduced to process variable-length log-Mel spectrograms. The cyclic filling method can make better use of the dynamic information of time, reduce the interference of invalid filled data on the model parameter learning. Since human emotion can only appear in certain specific moments in speech, in order to the key emotion features, this paper constructs a speech emotion recognition model based on an efficient channel attention mechanism. The efficient channel attention mechanism can calculate the importance of channel graphs, selectively emphasize channel graphs and improve the expression of specific emotions. In this paper,?relevant experiments were carried out on Interactive Emotional Motion Capture (IEMOCAP) corpus.The weighted accuracy (WA) and unweighted accuracy (UA) of the cyclic filling method reaches 73.2% and 70.9% respectively , the WA and UA of the proposed model reaches76.0% and 73.4% respectively on IEMOCAP.