大模型逐像素预测赋能的图像语义通信: 一种分离信源信道编码的视角
LVM-Empowered Image Semantic Communication via Next-pixel Prediction: A Separate Source-Channel Coding Perspective
-
摘要: 随着6G愿景的展开,语义通信成为核心技术。当前主流的基于深度学习的联合信源信道编码(Joint Source-Channel Coding, JSCC)方案虽在特定条件下性能优异,但固有的兼容性差、泛化能力弱和设计灵活性低等问题限制了其广泛应用。为应对这些挑战,本文回归分离式信源信道编码(Separate Source-Channel Coding, SSCC)范式,提出一种基于视觉大模型的分离信源信道编码框架(Large Visual Model-based Separate Source-Channel Coding Framework, LVM-SSCC)。该框架创新性地利用视觉大模型(如ImageGPT)进行自回归像素预测,并结合算术编码实现对信源的高效无损压缩;同时,在信道编码端引入纠错码Transformer(Error Correction Code Transformer, ECCT)来增强低密度奇偶校验(Low-Density Parity-Check, LDPC)码的译码鲁棒性。为实现公平比较,本文提出了统一能耗信噪比(Unified Energy Consumption-based Signal-to-Noise Ratio, SNRunified)评估基准。在CIFAR-10数据集上的大量仿真实验表明,无论在加性高斯白噪声(Additive White Gaussian Noise, AWGN)还是瑞利衰落信道下,本文提出的方案在图像重建质量(峰值信噪比(Peak Signal-to-Noise Ratio, PSNR)和结构相似性指数(Structural Similarity Index, SSIM))方面,尤其是在中高信噪比区域,均显著优于DeepJSCC和SparseSBC等主流JSCC方案,在保持与数字通信系统完全兼容的同时,于其优势信噪比区间内实现了逼近无损的极高保真度重建。本研究为分离式编码范式在未来图像语义通信中的应用提供了强有力的实证,并凸显了其在性能、兼容性与灵活性上的综合优势。Abstract: As the vision for 6G unfolds, semantic communication is emerging as a core technology. The prevailing paradigm, deep learning-based joint source-channel coding (JSCC), performs well under specific conditions but is hampered by inherent limitations such as poor compatibility with digital systems, weak generalization, and low design flexibility. To address these challenges, this study revisits the separate source-channel coding (SSCC) paradigm and proposes the large visual model-based separate source-channel coding framework (LVM-SSCC). This framework innovatively leverages large vision models (e.g., ImageGPT) for autoregressive pixel prediction, which, combined with arithmetic coding, achieves highly efficient lossless source compression. Concurrently, an error correction code transformer (ECCT) is introduced on the channel-coding side to enhance the low-density parity-check (LDPC) decoding robustness. To ensure a fair comparison, this study utilized a unified energy consumption-based signal-to-noise ratio (SNRunified) metric. Extensive simulations on the CIFAR-10 dataset demonstrated that under both additive white Gaussian noise (AWGN) and Rayleigh fading channels, the proposed scheme significantly outperformed mainstream JSCC schemes such as DeepJSCC and SparseSBC in terms of the image reconstruction quality (peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM)). This was especially true in the mid-to-high SNR region, where our scheme achieved near-lossless reconstruction with high fidelity while maintaining full compatibility with digital communication systems. The results of this study provide compelling evidence of the benefits of using the SSCC paradigm in future image semantic communication, highlighting its comprehensive advantages in performance, compatibility, and flexibility.
下载: