基于实值离散Gabor变换的联合时频域语音增强

Speech Enhancement in joint time-frequency domain based on Real-Valued Discrete Gabor Transform

  • 摘要: 传统变换域语音增强方法对语音做短时平稳性假设,这会造成对语音信号和噪声信号谱估计不准确,从而导致语音失真和残留噪声。本文提出一种从联合时频域进行语音增强的方法,该算法无需对语音做短时平稳假设。算法采用具有最佳能量聚集特性的高斯变换核函数,利用能快速实现的实值离散Gabor变换(RDGT)将语音信号变换到联合时频域,然后利用语音和噪声谱服从高斯分布的假设和无语音概率的思想进行基于最小均方误差的语音对数谱估计,采用改进的最小受控递归平均算法(IMCRA)进行噪声时频谱估计,在得到纯净语音的谱估计后利用实值离散Gabor逆变换获得纯净语音估计。实验表明,该算法相比频域变换算法具有较好的语音去噪度和较低的语音失真度。

     

    Abstract: Conventional speech enhancement methods which are always conducted with short time Fourier transform are based on the assumption that speech signals are short time stationary. This leads to imprecisely spectral estimation of noise and consequently residual noise and speech distortion are found in the enhanced speech. In this paper, we proposed a new speech enhancement method without the quasi stationary assumption. Noisy speech was first transformed to the joint time-frequency by fast Real-Valued Discrete Gabor Transform (RDGT) in which the Gaussian window was used as the transform kernel because of its superior local energy assembling. The MMSE based log amplitude estimator of speech was derived under speech presence uncertainty hypothesis and the assumption that speech and noise data are statistically independent Gaussian random variables. The noise spectral was estimated by improved minima controlled recursive averaging (IMCRA) algorithm. The clean speech estimation was got by inverse transform of RDCT. Experimental results showed that the proposed method is very effective in avoiding the musical residual noise and retaining weak speech components.

     

/

返回文章
返回