Abstract:
In modern telecommunications, both echo and reverberation can significantly disturb people's communication and degrade the speech intelligibility and quality. In order to overcome the negative impact of the echo and reverberation simultaneously, we proposed a two-stage joint-training system based on deep learning to enhance the speech signal, where echo cancellation and speech dereverberation were conducted sequentially. The system is composed of two stages, echo cancellation stage and dereverberation stage. The system firstly employed a model based on ideal ratio mask (IRM) to cancel the acoustic echo, which was uncorrelated with the target signal. Then the reverberation strongly correlated with the target signal was removed using a spectrum mapping model combined with a hidden mask. Then the two-stage model was jointly trained to obtain a better performance. A series of systematic experiments were conducted in different conditions and the results indicated that the proposed system significantly improves the performance on echo cancellation and dereverberation and achieves better speech intelligibility and quality over other methods.