深度神经网络压缩与加速综述
Deep Neural Network Compression and Acceleration: An Overview
-
摘要: 近年来,随着图形处理器性能的飞速提升,深度神经网络取得了巨大的发展成就,在许多人工智能任务中屡创佳绩。然而,主流的深度学习网络模型由于存在计算复杂度高、内存占用较大、耗时长等缺陷,难以部署在计算资源受限的移动设备或时延要求严格的应用中。因此,在不显著影响模型精度的前提下,通过对深度神经网络进行压缩和加速来轻量化模型逐渐引起研究者们的重视。本文回顾了近年来的深度神经网络压缩和加速技术。这些技术分为四类:参数量化、模型剪枝、轻量型卷积核设计和知识蒸馏。对于每个技术类别,本文首先分析了各自的性能及存在的缺陷。另外,本文总结了模型压缩与加速的性能评估方法。最后,讨论了模型压缩与加速领域存在的挑战和未来研究的可能方向。Abstract: In recent years, with the rapid improvement of graphic processor unit(GPU) performance, deep neural network (DNN) has made great achievements in many artificial intelligence tasks. However, the mainstream deep learning network model has some defects, such as high computational complexity, large memory consumption and long time-consuming, which makes it difficult to be deployed in mobile devices with limited computing resources or applications with strict delay requirements. Therefore, on the premise of maintaining the accuracy of the model, it gradually attracts a lot of attention from both academia and industry to reduce the weight of the model by compressing and accelerating the DNN. This paper reviews the compression and acceleration techniques of DNNs in recent years. These technologies can be divided into four categories: quantization, model pruning, lightweight convolution kernel design and knowledge distillation. For each technology category, this paper firstly analyzes the development status and existing defects. Then, this paper summarizes the performance evaluation methods of model compression and acceleration. Finally, the challenges in the field of model compression and acceleration, and the possible future research directions are discussed.