Abstract:
Most speech separation methods based on deep neural networks are trained in frequency domain, and in the process of training, they usually only focus on the features of target speech, without considering the features of interference speech. For this reason, a speech separation method based on cooperative training of generative adversarial network is proposed. This method takes the time-domain waveform as the network’s input and retains the phase information caused by the signal delay. At the same time, the generative model and discriminative model are used to train the features of the target speech and the interference speech respectively, which improves the effectiveness of speech separation. In the experiment, a comparative test is performed on the Aishell data set. The results show that the proposed method has a good separation effect under three SNR conditions, and can better recover the high frequency band information of the target speech.