基于时域Gammatone滤波特征的广播语种识别

Language Identification for Broadcasting Signal Based on Time-domain Gammatone Filtering Features

  • 摘要: 针对广播语种识别问题,提出一种语音时域滤波方法,用gammatone时域函数与预处理后的语音信号进行卷积滤波,再分帧加窗并求对数化能量得到时域GF(gammatone filterbank)特征。将特征参数图像化表示,然后通过VGG19和Resnet34分类网络进行语种识别实验。同时,也使用自动色阶算法对加噪语音的图像化特征参数进行去噪,并对比不同维数的特征参数以及不同噪声类型和信噪比对语种识别率的影响。结果表明,采用该特征参数的广播语种识别准确率高于使用传统的GFCC特征、GFCC-D-A特征、GFCC-SDC特征及Fbank 特征,且在不同噪声类型和不同信噪比的广播语音识别场景下,语种识别准确率均有一定提升。

     

    Abstract: A speech time-domain filtering method is proposed for the broadcast language identification problem, where the gammatone time-domain function is used to convolutionally filter the pre-processed speech signal, and the windowing and signal energy logarithmizing are then used to find the time-domain gammatone filterbank features in each separate frame. After that, the feature parameters are represented pictorially. With the obtained feature parameters, the language identification experiments are carried out by VGG19 and Resnet34 classification networks. The automatic color scale algorithm is also used to denoise the imaged feature parameters of noise-added speech and to compare the effect of different dimensional feature parameters and different noise types and signal-to-noise ratios on the performance of language identification accuracy. The results show that the language recognition accuracy with the proposed feature parameters is higher than that with the traditional GFCC feature, GFCC-D-A feature, GFCC-SDC feature and Fbank feature, and the language identification accuracy is also improved in different noise types and different signal-to-noise ratios under broadcast speech identification scenarios.

     

/

返回文章
返回