用于语音控制的低资源关键词检索系统

A small footprint keyword spotting system for voice control

  • 摘要: 基于深度神经网络的低资源条件下关键词检索已经取得了很大的进展,但这些方法仍旧需要较多的参数才能保证模型的精度。为了进一步减少模型的参数量,本文将Squeeze-and-Excitation网络和深度可分离卷积应用在关键词检索任务中。首先利用Squeeze-and-Excitation网络对不同特征通道之间的相互依赖关系建模的能力进一步提升模型的精度,然后通过将标准卷积替换为深度可分离卷积来有效的减少模型所需要的参数。在谷歌语音命令数据集上的实验证明我们的模型可以在保证高精度的同时把参数量限制在一定的范围内。

     

    Abstract: Deep neural network based resource-limited keyword spotting systems have made great progress in recent years, but these methods still need a lot of parameters to get the state-of-the-art performance. In this paper, we focus on the tradeoff between achieving high detection accuracy and having a small model size. We propose to apply Squeeze-and-Excitation network and depthwise separable convolution in keyword spotting task. Specifically, We first improve the model performance by explicitly modelling the interdependencies between the channels of convolutional features with a so-called squeeze-and-excitation network. Then, we replace the standard convolution with the depthwise separable convolution, which greatly reduces the number of parameters of the standard convolution. We compared the proposed method with two convolutional neural network based models on Google Speech Commands dataset. Experimental results show that the proposed method significantly outperforms the comparison methods in terms of detection accuracy and model size. For example, it achieves a detection accuracy of 96.16% with a number of parameters of 75.5K, which significantly outperforms the comparison methods.

     

/

返回文章
返回