Abstract:
To address the problem of difficult extraction of rich global speech-related information and underutilization of intermediate level features in deep neural networks for speech enhancement, this paper designed a novel convolutional codec network based on information refinement and aggregation of residual features for speech enhancement based on attention U-Net with the smallest possible parameters. The mentioned network proposed a Two-Dimensional Hierarchical Refined Residual (HRR) module in the codec part, which could significantly reduce the training parameters and expanded the perceptual field to extract multi-scale contextual information at different levels; A lightweight One-Dimensional Channel Dimension Adaptive Attention (1D-CAA) module was proposed in the transmission layer, combining gating mechanism and parametric normalization to selectively deliver features and improve network expression capability, and a Gating Residual Feature Aggregation (GRFA) network was built jointly with gating residual linear units to enhance inter-layer information flow and make full use of intermediate level feature details. Residual feature aggregation network, which enhanced the information flow between layers and made full use of the intermediate level feature details to obtain more time-series relevant information. In the experimental part, this paper was trained and tested in 21 noisy environments, and finally achieved better objective and subjective indexes with 1.23×10
6 parameters compared with other methods, with strong enhancement effect and generalization ability, and a good balance of model complexity and accuracy.