An End-to-End Speech Separation Method Based on Convolutional Neural Network

Fan Cunhang; Liu Bin; Tao Jianhua; Wen Zhengqi; Yi Jiangyan

doi:10.16798/j.issn.1003-0530.2019.04.003

Fan Cunhang, Liu Bin, Tao Jianhua, Wen Zhengqi, Yi Jiangyan. An End-to-End Speech Separation Method Based on Convolutional Neural Network[J]. JOURNAL OF SIGNAL PROCESSING, 2019, 35(4): 542-548. DOI: 10.16798/j.issn.1003-0530.2019.04.003

Citation:

An End-to-End Speech Separation Method Based on Convolutional Neural Network

Graphical Abstract

Abstract

Abstract

Most of speech separation systems usually enhance the magnitude spectrum of the mixture. The phase spectrum is left unchanged, which is inherent in the short-time Fourier transform (STFT) coefficients of the input signal. However, recent studies have suggested that phase was important for perceptual quality. In order to simultaneously make full use of magnitude and phase, this work develops a novel end-to-end method for two-talker speech separation, based on an encoder-decoder fully-convolutional structure. Different from traditional speech separation systems, in this paper, deep neural network outputs one speaker’s signals exclusively. We evaluate the proposed model on the TIMIT dataset. The experimental results show that the proposed method significantly outperforms the permutation invariant training (PIT) baseline method, with a relative improvement of 16.06\% in signal-to-distortion ratio (SDR).

FullText(HTML)

References (0)

Supplements (0)

Cited By

An End-to-End Speech Separation Method Based on Convolutional Neural Network

Abstract

Catalog

Export File

Citation

Format

Content