XU Chundong, HUANG Qiaoyue, WANG Lei, XU Jinwu. Acoustic Echo Cancellation Algorithm Incorporating Dynamic Scene Perception and Attention Mechanisms[J]. JOURNAL OF SIGNAL PROCESSING, 2024, 40(2): 396-405. DOI: 10.16798/j.issn.1003-0530.2024.02.017
Citation: XU Chundong, HUANG Qiaoyue, WANG Lei, XU Jinwu. Acoustic Echo Cancellation Algorithm Incorporating Dynamic Scene Perception and Attention Mechanisms[J]. JOURNAL OF SIGNAL PROCESSING, 2024, 40(2): 396-405. DOI: 10.16798/j.issn.1003-0530.2024.02.017

Acoustic Echo Cancellation Algorithm Incorporating Dynamic Scene Perception and Attention Mechanisms

  • ‍ ‍The removal of acoustic echoes to obtain clear speech is one of the most important challenges for real-time audio and video communication systems. Acoustic echo cancellation technology is designed to eliminate acoustic echoes from audio and video communication systems to improve the voice quality during calls and give users a good call experience. However, conventional echo cancellation systems suffer from ineffective de-echoing, non-linear echo residuals, and the inability to process echoes in real time. Therefore, an acoustic echo cancellation algorithm that combines a dynamic scene perception module (DSPM) and global attention mechanism (GAM) is proposed to solve the above-mentioned problems. A convolutional recurrent network (CRN) was used as the baseline model to extract the sequential features of the speech signals. First, the DSPM module was used to replace the causal convolution in its encoder, which dynamically allocated the number of convolutional kernels according to the scene and enhanced the adaptive nature of the model. Second, the GAM module was introduced in each of the last two layers of the encoder to amplify the spatial inter-channel relationships and coordinate global interactions to improve the extraction of speech signal features and the echo-cancellation performance. Finally, the robustness of the model was further improved by linearly adding the MSE and HuberLoss loss functions to generate a new loss function (MSE-HuberLoss). Experimental results showed that the proposed GAM-DSPM-CRN model had an excellent echo-cancellation performance and obtained a clearer reconstructed speech signal than the baseline model. The proposed GAM-DSPM-CRN model acoustic echo cancellation algorithm provided a greater performance improvement than other comparative algorithms in a two-ended call environment. On the Microsoft AEC Challenges dataset, the MOS, ERLE, and STOI scores reached 4.09, 57.43, and 0.78, respectively.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return