Citation: | AN Ping, SHA Liya, WU Ying, et al. Scalable image coding via feature decoupling for human and machine with task-adaptive enhancement[J]. Journal of Signal Processing, 2025, 41(2): 399-408. DOI: 10.12466/xhcl.2025.02.017. |
Image compression is a critical technology designed to minimize information redundancy during transmission while preserving the quality of the compressed image. With advancements in computer vision, images are increasingly consumed by machines in addition to humans, necessitating compression methods that cater to both human and machine vision requirements. While current learning-based image coding techniques have significantly improved human visual perception, they struggle to balance signal fidelity and semantic fidelity, limiting their ability to meet the needs of both audiences effectively. To address this limitation, this study proposes a task-adaptive, feature-decoupling scalable compression method. This approach supports multiple machine vision tasks using a single bitstream and enables selective or complete image reconstruction depending on specific requirements. The proposed method decouples image features into object and background features, compressing and reconstructing them independently. Reconstructed object features are employed for tasks such as object detection and semantic segmentation, whereas the fully reconstructed image caters to human visual perception. This method enhances compression efficiency by eliminating the need to reconstruct the entire image for visual tasks, thereby meeting the distinct demands of human perception. Furthermore, to address performance imbalances arising from variations in the importance of target regions, a plug-and-play task-adaptive unit is integrated into the target feature decoder. This unit enables task-specific adjustments to improve the analysis performance of reconstructed target images without requiring retraining of the entire network. Experimental results demonstrate that the proposed method outperforms conventional encoders and decoders in task performance while achieving superior Rate-Distortion efficiency. These findings underscore the potential of this method to advance scalable image compression for both human and machine vision applications.
[1] |
Moving Picture Experts Group. Draft call for evidence for video coding for machines:w19077[S]. Brussels,Belgium:2020:2- 5.
|
[2] |
Joint Bi-level Image Experts Group,Joint Photographic Experts Group. JPEG AI use cases and requirements:N90021[S]. Online:90th JPEG Meeting,2021:2- 7.
|
[3] |
WALLACE G K. The JPEG still picture compression standard[J]. IEEE Transactions on Consumer Electronics,1992,38(1):xviii-xxxiv. doi:10.1109/30.125072 doi: 10.1109/30.125072
|
[4] |
李莲,魏石磊. 一种基于VP8编码的WebP图片压缩格式研究[J]. 单片机与嵌入式系统应用,2012,12(3):40- 43.
LI Lian,WEI Shilei. WebP:A new image compression format based on VP8 encoding[J]. Microcontrollers& Embedded Systems,2012,12(3):40- 43.(in Chinese)
|
[5] |
SULLIVAN G J,OHM J R,HAN W J,et al. Overview of the high efficiency video coding(HEVC)standard[J]. IEEE Transactions on Circuits and Systems for Video Technology,2012,22(12):1649- 1668. doi:10.1109/tcsvt.2012.2221191 doi: 10.1109/tcsvt.2012.2221191
|
[6] |
BROSS B,WANG Yekui,YE Yan,et al. Overview of the versatile video coding(VVC)standard and its applications[J]. IEEE Transactions on Circuits and Systems for Video Technology,2021,31(10):3736- 3764. doi:10.1109/tcsvt.2021.3101953 doi: 10.1109/tcsvt.2021.3101953
|
[7] |
LEE S,JEONG J B,RYU E S. Entropy-constrained implicit neural representations for deep image compression[J]. IEEE Signal Processing Letters,2023,30:663- 667. doi:10.1109/lsp.2023.3279780 doi: 10.1109/lsp.2023.3279780
|
[8] |
ZHANG Gai,ZHANG Xinfeng,TANG Lv. Enhanced quantified local implicit neural representation for image compression[J]. IEEE Signal Processing Letters,2023,30:1742- 1746. doi:10.1109/lsp.2023.3334956 doi: 10.1109/lsp.2023.3334956
|
[9] |
CHENG Zhengxue,SUN Heming,TAKEUCHI M,et al. Learned image compression with discretized Gaussian mixture likelihoods and attention modules[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Seattle,WA,USA. IEEE,2020:7936- 7945. doi:10.1109/cvpr42600.2020.00796 doi: 10.1109/cvpr42600.2020.00796
|
[10] |
ZOU Renjie,SONG Chunfeng,ZHANG Zhaoxiang. The devil is in the details:window-based attention for image compression[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). New Orleans,LA,USA. IEEE,2022:17471- 17480. doi:10.1109/cvpr52688.2022.01697 doi: 10.1109/cvpr52688.2022.01697
|
[11] |
CAI Qi,CHEN Zhifeng,WU D O,et al. A novel video coding strategy in HEVC for object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology,2021,31(12):4924- 4937. doi:10.1109/tcsvt.2021.3056134 doi: 10.1109/tcsvt.2021.3056134
|
[12] |
WANG Qiang,SHEN Liquan,SHI Yuan. Recognition-driven compressed image generation using semantic-prior information[J]. IEEE Signal Processing Letters,2020,27:1150- 1154. doi:10.1109/lsp.2020.3004967 doi: 10.1109/lsp.2020.3004967
|
[13] |
WANG Shurun,WANG Zhao,WANG Shiqi,et al. End-to-end compression towards machine vision:network architecture design and optimization[J]. IEEE Open Journal of Circuits and Systems,2021,2:675- 685. doi:10.1109/ojcas.2021.3126061 doi: 10.1109/ojcas.2021.3126061
|
[14] |
WANG Mengyang,ZHANG Zhicong,LI Jiahui,et al. Deep joint source-channel coding for multi-task network[J]. IEEE Signal Processing Letters,2021,28:1973- 1977. doi:10.1109/lsp.2021.3113827 doi: 10.1109/lsp.2021.3113827
|
[15] |
刘东,王叶斐,林建平,等. 端到端优化的图像压缩技术进展[J]. 计算机科学,2021,48(3):1- 8.
LIU Dong,WANG Yefei,LIN Jianping,et al. Advances in end-to-end optimized image compression technologies[J]. Computer Science,2021,48(3):1- 8.(in Chinese)
|
[16] |
GAO Changsheng,LIU Dong,LI Li,et al. Towards task-generic image compression:a study of semantics-oriented metrics[J]. IEEE Transactions on Multimedia,2021,25:721- 735.
|
[17] |
YANG Shuai,HU Yueyu,YANG Wenhan,et al. Towards coding for human and machine vision:scalable face image coding[J]. IEEE Transactions on Multimedia,2021,23:2957- 2971. doi:10.1109/tmm.2021.3068580 doi: 10.1109/tmm.2021.3068580
|
[18] |
WANG Shurun,WANG Shiqi,YANG Wenhan,et al. Towards analysis-friendly face representation with scalable feature and texture compression[J]. IEEE Transactions on Multimedia,2022,24:3169- 3181. doi:10.1109/tmm.2021.3094300 doi: 10.1109/tmm.2021.3094300
|
[19] |
FANG Xin,DUAN Yiping,DU Qiyuan,et al. Sketch assisted face image coding for human and machine vision:a joint training approach[J]. IEEE Transactions on Circuits and Systems for Video Technology,2023,33(10):6086- 6100. doi:10.1109/tcsvt.2023.3262251 doi: 10.1109/tcsvt.2023.3262251
|
[20] |
CHOI H,BAJIC I V. Scalable image coding for humans and machines[J]. IEEE Transactions on Image Processing,2022,31:2739- 2754. doi:10.1109/tip.2022.3160602 doi: 10.1109/tip.2022.3160602
|
[21] |
REDMON J,FARHADI A. YOLO9000:Better,faster,stronger[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Honolulu,HI,USA. IEEE,2017:6517- 6525. doi:10.1109/cvpr.2017.690 doi: 10.1109/cvpr.2017.690
|
[22] |
REN Shaoqing,HE Kaiming,GIRSHICK R,et al. Faster R-CNN:Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137- 1149. doi:10.1109/tpami.2016.2577031 doi: 10.1109/tpami.2016.2577031
|
[23] |
BADRINARAYANAN V,KENDALL A,CIPOLLA R. SegNet:A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(12):2481- 2495. doi:10.1109/tpami.2016.2644615 doi: 10.1109/tpami.2016.2644615
|
[24] |
CHEN L C,ZHU Yukun,PAPANDREOU G,et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[M]// Lecture Notes in Computer Science. Cham:Springer International Publishing,2018:833- 851. doi:10.1007/978-3-030-01234-2_49 doi: 10.1007/978-3-030-01234-2_49
|
[25] |
ZHANG Xiaoning,WANG Tiantian,QI Jinqing,et al. Progressive attention guided recurrent network for salient object detection[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City,UT,USA. IEEE,2018:714- 722. doi:10.1109/cvpr.2018.00081 doi: 10.1109/cvpr.2018.00081
|
[26] |
LIU Wei,ANGUELOV D,ERHAN D,et al. SSD:single shot MultiBox detector[C]// Lecture Notes in Computer Science. Cham:Springer International Publishing,2016:21- 37. doi:10.1007/978-3-319-46448-0_2 doi: 10.1007/978-3-319-46448-0_2
|
[27] |
BALLÉ J,MINNEN D,SINGH S,et al. Variational image compression with a scale hyperprior[C]// 6th International Conference on Learning Representations(ICRL),2018:40- 48.
|
[28] |
MINNEN D,BALLÉ J,TODERICI G. Joint autoregressive and hierarchical priors for learned image compression[C]// Advances in Neural Information Processing Systems,2018:10771- 10780.
|
[29] |
FOROUTAN Y,HARELL A,ANDRADE A,et al. Base layer efficiency in scalable human-machine coding[C]// 2023 IEEE International Conference on Image Processing(ICIP). Kuala Lumpur,Malaysia. IEEE,2023:3299- 3303. doi:10.1109/icip49359.2023.10223087 doi: 10.1109/icip49359.2023.10223087
|