Abstract:
Traditional speech endpoint detection methods make use of the difference between speech and noise in a single parameter to segment the start and end points of speech in the signal. However, the performance of different parameters under different noise environments with low signal-to-noise ratio is unstable and the robustness is poor. To overcome such problem, this paper proposed a speech endpoint detection method based on the fusion of four parameters: sub-band spectral variance, energy entropy ratio, MFCC cepstrum distance and likelihood ratio. This method could change the threshold of each parameter adaptively, then determined the voting mechanism by real-time detection of the energy entropy ratio of the noise segment, so as to determine the speech endpoint. Experimental results show that the proposed method has higher detection accuracy and robustness than the conventional endpoint detection methods in the case of low signal-to-noise ratio. The proposed method has certain reference significance for the follow-up processing of speech signal.