基于<b>FPGA</b>的神经网络硬件加速器研究综述

孟庆昊; 边丽蘅

doi:10.12466/xhcl.2025.12.001

摘要: 随着深度学习在计算机视觉、自然语言处理、自动驾驶等领域的广泛应用，使得神经网络的模型复杂度和规模呈现爆炸式增长，对硬件计算能力提出了严峻挑战。传统的通用计算平台，如中央处理器（Central Processing Unit，CPU）和图形处理器（Graphics Processing Unit，GPU）在能效、实时性以及灵活性方面逐渐不足，尤其是在针对边缘计算、低功耗场景方面的表现未达预期，神经网络的算法优化和硬件加速成为当下研究的热点。针对上述问题，作为一种可重构硬件，现场可编程门阵列（Field-Programmable Gate Array，FPGA）因具备并行性、低功耗以及灵活的可编程特性，在深度学习的硬件加速领域显示出独特优势。本文系统综述了基于FPGA的神经网络硬件加速技术，涵盖了硬件计算架构优化、分层存储设计和模型压缩方法等方面的最新研究进展，详细分析了主流神经网络模型的计算特性和硬件加速框架，包括卷积神经网络（Convolutional Neural Network，CNN）、循环神经网络（Recurrent Neural Network，RNN）、生成对抗网络（Generative Adversarial Network，GAN）和Transformer。与此同时，本文还概述了核心的FPGA加速技术，包括并行计算架构与双缓冲策略、稀疏矩阵计算和结构化剪枝。最后，本文对于FPGA神经网络加速器应用所面临的挑战，如在资源受限条件下的模型优化以及算法与硬件适配性欠佳等问题上，提出了一系列可行的解决方案，并探讨了未来的研究方向。

Abstract: With the widespread application of deep learning in fields such as computer vision， natural language processing， and autonomous driving， the complexity and scale of neural network models have grown explosively. This growth poses significant challenges to hardware computing capabilities. Traditional general-purpose computing platforms， such as CPUs and GPUs， are increasingly falling short in energy efficiency， real-time performance， and flexibility， particularly in edge computing and low-power scenarios， where their performance often fails to meet expectations. Consequently， algorithm optimization and hardware acceleration for neural networks have become prominent topics in current research. To address these challenges， field-programmable gate arrays （FPGAs）， as reconfigurable hardware， have demonstrated unique advantages in deep learning hardware acceleration due to their parallelism， low power consumption， and flexible programmability. This paper systematically reviews FPGA-based neural network hardware acceleration technologies， covering the latest research progress in computing architecture optimization， hierarchical memory design， and model compression methods. It provides a detailed analysis of the computational characteristics and hardware acceleration frameworks of mainstream neural network models， including convolutional neural networks （CNNs）， recurrent neural networks （RNNs）， generative adversarial networks （GANs）， and Transformers. Additionally， the paper outlines core FPGA acceleration techniques such as parallel computing architectures with double-buffering strategies， sparse matrix computation， and structured pruning. Finally， it discusses the challenges faced by FPGA-based neural network accelerators， including model optimization under resource-constrained conditions and the limited adaptability between algorithms and hardware. A series of feasible solutions are proposed， and future research directions are explored.

基于FPGA的神经网络硬件加速器研究综述

Research on FPGA-Based Neural Network Hardware Accelerators： A Review