Latent Space Residual Denoising for Video Semantic Communication
-
Abstract
With the proliferation of multimedia applications, visual data, such as video has come to dominate network traffic, thereby imposing increasingly stringent requirements on high-reliability and low-latency data transmission. Conventional separate-source and channel-coding schemes face performance bottlenecks in dynamic channel environments. As a novel communication paradigm, semantic communication improves transmission efficiency and robustness by extracting and transmitting semantic information from the source. However, existing latent-space denoising methods in visual semantic communication usually suffer from high computational complexity and insufficient semantic fidelity. To address these challenges, this paper proposes a video semantic communication framework based on latent-space residual denoising. The framework employs a Swin Transformer-based joint source-channel codec, and incorporates an iterative semantic denoiser designed by residual learning and similarity-based learning. The residual mapping directly predicts and removes channel noise, thereby significantly improving the denoising efficiency. Additionally, a signal-to-noise ratio (SNR)-driven similarity score is introduced as a conditional input to dynamically adjust the denoising intensity, and an adaptive denoising-step strategy is employed to balance performance and latency. Simulation results demonstrate that the proposed method effectively suppresses noise over additive white Gaussian noise (AWGN) channels. Moreover, it can outperform conventional separate coding and end-to-end semantic communication schemes across various distortion and perceptual video metrics, particularly under high-noise conditions. Furthermore, the proposed SNR-driven similarity score initialization achieves close approximation to the empirical distribution, as validated by Monte Carlo simulations.
-
-