ZHANG Yue, ZHANG Xiongwei, SUN Meng. Bone-Conducted Robust Speech Enhancement Based on Time-Frequency Domain Attention Mechanism and U-Net[J]. JOURNAL OF SIGNAL PROCESSING, 2022, 38(10): 2134-2143. DOI: 10.16798/j.issn.1003-0530.2022.10.014
Citation: ZHANG Yue, ZHANG Xiongwei, SUN Meng. Bone-Conducted Robust Speech Enhancement Based on Time-Frequency Domain Attention Mechanism and U-Net[J]. JOURNAL OF SIGNAL PROCESSING, 2022, 38(10): 2134-2143. DOI: 10.16798/j.issn.1003-0530.2022.10.014

Bone-Conducted Robust Speech Enhancement Based on Time-Frequency Domain Attention Mechanism and U-Net

  • ‍ ‍In recent years, methods based on neural networks are applied to Bone-Conducted (BC) speech enhancement. However, due to the small number of BC speech datasets, the lack of BC speech in high-frequency part, and the different distortion degree of different speakers in high-frequency part, it is difficult for neural networks to effectively learn the spectrum characteristics. As a result, the existing BC speech enhancement methods are not effective and robust enough to unseen speakers. In order to make full use of the time-frequency information of BC speech and guide the model to pay attention to the characteristics of low-frequency spectrum, this paper proposes a robust enhancement method based on the time-frequency domain attention mechanism and U-Net. This method introduces the time-frequency attention mechanism into the U-Net structure. Weight is first automatically distributed according to the important information of the characteristic information in time and frequency direction. Then use the weighted BC spectrum as the input, and the corresponding Air-Conducted (AC) speech spectrum as the goal to enter the U-Net structure training, and finally uses the speech enhancement model to reconstruct full-band speech. The simulation and visual analysis results show that the method proposed in this paper can achieve higher objective evaluation scores of PESQ and STOI and better speech intelligibility than the baseline U-Net structure and other attention mechanisms on the unseen speaker datasets.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return