Fan Cunhang, Liu Bin, Tao Jianhua, Wen Zhengqi, Yi Jiangyan. An End-to-End Speech Separation Method Based on Convolutional Neural Network[J]. JOURNAL OF SIGNAL PROCESSING, 2019, 35(4): 542-548. DOI: 10.16798/j.issn.1003-0530.2019.04.003
Citation: Fan Cunhang, Liu Bin, Tao Jianhua, Wen Zhengqi, Yi Jiangyan. An End-to-End Speech Separation Method Based on Convolutional Neural Network[J]. JOURNAL OF SIGNAL PROCESSING, 2019, 35(4): 542-548. DOI: 10.16798/j.issn.1003-0530.2019.04.003

An End-to-End Speech Separation Method Based on Convolutional Neural Network

  • Most of speech separation systems usually enhance the magnitude spectrum of the mixture. The phase spectrum is left unchanged, which is inherent in the short-time Fourier transform (STFT) coefficients of the input signal. However, recent studies have suggested that phase was important for perceptual quality. In order to simultaneously make full use of magnitude and phase, this work develops a novel end-to-end method for two-talker speech separation, based on an encoder-decoder fully-convolutional structure. Different from traditional speech separation systems, in this paper, deep neural network outputs one speaker’s signals exclusively. We evaluate the proposed model on the TIMIT dataset. The experimental results show that the proposed method significantly outperforms the permutation invariant training (PIT) baseline method, with a relative improvement of 16.06\% in signal-to-distortion ratio (SDR).
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return