基于Conformer的端到端语音识别模型的压缩优化策略
Compression Optimization Strategy for End-to-End ASR Model Based on Conformer
-
摘要: 随着深度学习的兴起,端到端语音识别模型受到越来越多的关注。最近,基于Conformer框架的提出,使得端到端语音识别模型的性能得到进一步的提升,同时在语音识别领域也得到了广泛的应用。然而,这些端到端模型由于内存和计算需求较大,所以在资源有限的设备上部署和推理是受限的。该文为了保证模型精度损失较小的情况下,尽可能地减少模型的大小和计算量,分别采用了模型量化,基于权重通道的结构化剪枝以及奇异值分解等三种压缩优化策略,同时对模型量化进行了改进。探究了不同程度的压缩对模型精度损失所造成的影响。通过结合这些策略在不同设备进行了测试,相比于基线在其字错误率误差小于3%的情况下,模型推理识别的速度约提升3~4倍。Abstract: With the rise of deep learning, the end-to-end speech recognition model has received increasing attention. Currently, the performance of the end-to-end speech recognition model has been further updated on basis of on the proposal of the Conformer Framework, which has been widely used in the field of speech recognition. However, these models perform poorly on edge hardware due to large memory and computation requirements. Under the premise of ensuring that the loss of accuracy of the model is as small as possible, in order to reduce the size and calculation amount of the model as much as possible, three compression and optimization strategies are adopted, namely Model Quantization, Structured Pruning based on Weight Channels and Singular Value Decomposition. The model quantization has been improved simultaneously. Influence in varying degrees of compression on the loss of model accuracy is explored. Tests were carried out on different devices by combining these strategies. Comparing with the status quo of the baseline in which the Word Error Rate is less than 3%, the speed of model inference recognition is approximately 3~4 times faster.