任务自适应增强的人机特征解耦可分级压缩

Scalable Image Coding via Feature Decoupling for Human and Machine with Task-Adaptive Enhancement

  • 摘要: 图像压缩作为一项关键技术,旨在传输过程中保留尽可能少的关键信息,同时使得压缩后的图像保持较好的质量。而随着计算机视觉的发展,图像的主要消费者不仅仅是人类而更多的是机器,因此探索一种能够同时面向人类视觉和机器视觉的图像压缩方法十分具有意义。然而,现有的基于学习的图像编码技术虽然已经在人眼感知质量上取得了显著性的进步,但由于信号保真度及语义保真度的方法在驱动目标上存在分歧,无法同时满足机器视觉和人眼的需求。因此,本文提出了任务自适应增强的特征解耦可分级压缩方法,旨在利用单一比特流来支持多种视觉任务,并根据需求进行图像的选择性重建或完全重建。具体而言,本方法将图像特征解耦为目标特征和背景特征分别进行压缩和重建,所得到的目标图像用于后续目标检测和语义分割任务,而高质量完整重建的图像供人眼观看。这样不仅在实现视觉任务时避免了重建完整图像,提高压缩效率,还能够满足人眼的不同需求。此外,为了解决因目标区域重要性差异而引起的任务性能不平衡问题,本方法还设计了可插拔的任务自适应单元,并将其嵌入在目标特征解码器中,从而可以根据具体任务需求调整特征以增强重建目标图像的分析性能,而无须重新训练整个网络。实验结果证明,该方法与其他编解码器相比,展现出了更优的任务性能和速率失真(Rate-Distortion)性能。

     

    Abstract: Image compression is a critical technology designed to minimize information redundancy during transmission while preserving the quality of the compressed image. With advancements in computer vision, images are increasingly consumed by machines in addition to humans, necessitating compression methods that cater to both human and machine vision requirements. While current learning-based image coding techniques have significantly improved human visual perception, they struggle to balance signal fidelity and semantic fidelity, limiting their ability to meet the needs of both audiences effectively. To address this limitation, this study proposes a task-adaptive, feature-decoupling scalable compression method. This approach supports multiple machine vision tasks using a single bitstream and enables selective or complete image reconstruction depending on specific requirements. The proposed method decouples image features into object and background features, compressing and reconstructing them independently. Reconstructed object features are employed for tasks such as object detection and semantic segmentation, whereas the fully reconstructed image caters to human visual perception. This method enhances compression efficiency by eliminating the need to reconstruct the entire image for visual tasks, thereby meeting the distinct demands of human perception. Furthermore, to address performance imbalances arising from variations in the importance of target regions, a plug-and-play task-adaptive unit is integrated into the target feature decoder. This unit enables task-specific adjustments to improve the analysis performance of reconstructed target images without requiring retraining of the entire network. Experimental results demonstrate that the proposed method outperforms conventional encoders and decoders in task performance while achieving superior Rate-Distortion efficiency. These findings underscore the potential of this method to advance scalable image compression for both human and machine vision applications.

     

/

返回文章
返回