XU Feng, LI Ping. DVUGAN: DDSP Integrated Variational U-Net Speech Enhancement Based on STDCT[J]. JOURNAL OF SIGNAL PROCESSING, 2022, 38(3): 582-589. DOI: 10.16798/j.issn.1003-0530.2022.03.016
Citation: XU Feng, LI Ping. DVUGAN: DDSP Integrated Variational U-Net Speech Enhancement Based on STDCT[J]. JOURNAL OF SIGNAL PROCESSING, 2022, 38(3): 582-589. DOI: 10.16798/j.issn.1003-0530.2022.03.016

DVUGAN: DDSP Integrated Variational U-Net Speech Enhancement Based on STDCT

  • In this paper, a DVUGAN model based on generative adversarial network design is proposed for speech enhancement tasks. The model works in the transform domain, and the input adopts the STDCT feature, which can express the phase implicitly and can be learned in the real valued network, avoiding the complex network or processing in the complex frequency domain, and reducing the complexity of the model while using the phase. The generator uses a variational U-Net codec, integrates DDSP components and utilizes strong inductive bias to significantly improve the performance of the autoencoder. The variational probability bottleneck improves the suppression of pulse noise sources and increases the robustness of unknown data distribution. Multi-scale Spectral Loss in DDSP is introduced to guide the generator to optimize the sensing performance by using the oscillator perception bias. The performance of the discriminant is optimized by the SI-SNR Loss, so as to balance the structure of the adversarial network and promote the stable training of the model. The model is evaluated to be superior to the baseline model and some recent studies in the DNS development dataset and Voice Bank+Demand dataset, which prove the superiority of the proposed DVUGAN in the field of speech enhancement in the transformation domain.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return