Abstract:
Timbre is an important clue for music perception and speech recognition. The traditional feature extraction method cannot obtain the ideal temporal resolution and frequency resolution at the same time, and the non-stationary information of audio is not well explored. To solve the above problems, the time varying filtering based EMD (TVF-EMD) method was adopted in this paper to extract the intrinsic mode function of audio for the Hilbert Transform, and constructed the Hilbert spectrum distribution features and Hilbert contour features. In the experiment of musical instrument classification, we combined the two kinds of features with the Mel frequency cepstral coefficients (MFCCs), and then constructed a time sequence classifier based on Bi-directional Long Short-Term Memory (BiLSTM). The experiment of musical instrument classification was carried out in the open musical instrument performance audio database. The experimental results show that the proposed features can supplement the non-linear non-stationary information which is not extracted from the traditional features such as MFCCs, and improve the adaptability and robustness of timbre features to complex audio.