Abstract:
As an important application of speech signal processing, speech enhancement aims to reduce the influence of background noise on speech signals. However, how to effectively separate target speech in extremely nonstationary noise environment is still a challenging problem. Speech enhancement based on nonnegative matrix factorization (NMF) is currently an advanced and effective technique for suppressing nonstationary noise, which models spectral subspaces of speech and noise using nonnegative basis matrices. First, in this paper, the theory of nonnegative matrix factorization is introduced in details, including the model of the NMF, the definition of cost functions and the commonly used multiplicative update rules. Then, the basic principle of the NMF-based speech enhancement methods is reviewed in details, including the specific processes of the training and enhancement stages, and the experiments are carried out. In addition, an NMF-based speech reconstruction experiment is used to verify the ability of speech basis matrix for modeling the speech spectrums. Finally, the shortcomings of the traditional NMF-based algorithms are summarized, and some existing NMF-based algorithms are respectively briefly reviewed including their innovations, advantages and disadvantages. Moreover, several typical methods are analyzed and compared. This paper shows the continuous developments of the NMF-based speech enhancement methods in a historical perspective.