Scalable Image Coding via Feature Decoupling for Human and Machine with Task-Adaptive Enhancement

AN Ping; SHA Liya; WU Ying; YANG Chao; HUANG Xinpeng

doi:10.12466/xhcl.2025.02.017

Volume 41 Issue 2

Feb. 2025

Turn off MathJax

Article Contents

Abstract

References

JOURNAL OF SIGNAL PROCESSING > 2025 > 41(2): 399-408. > DOI: 10.12466/xhcl.2025.02.017

AN Ping, SHA Liya, WU Ying, et al. Scalable image coding via feature decoupling for human and machine with task-adaptive enhancement[J]. Journal of Signal Processing, 2025, 41(2): 399-408. DOI: 10.12466/xhcl.2025.02.017.

Citation:

PDF (3775 KB)

Scalable Image Coding via Feature Decoupling for Human and Machine with Task-Adaptive Enhancement

AN Ping^{1, 2, ,},
SHA Liya^{1, 2},
WU Ying^{1, 2},
YANG Chao^{1, 2},
HUANG Xinpeng^{1, 2}

1.
School of Communication and Information Engineering，Shanghai University，Shanghai 200444，China
2.
Key Laboratory of Special Fiber Optics and Optical Access Networks，Shanghai University，Shanghai 200444，China

Funds:

The National Natural Science Foundation of China 62071287

The National Natural Science Foundation of China 62020106011

The National Natural Science Foundation of China 62371279

The National Natural Science Foundation of China 62371278

Science and Technology Commission of Shanghai Municipality 22ZR1424300

More Information

Corresponding author:
AN Ping， anping@shu.edu.cn
Received Date: October 31, 2024

Graphical Abstract

Abstract

Abstract

Image compression is a critical technology designed to minimize information redundancy during transmission while preserving the quality of the compressed image. With advancements in computer vision， images are increasingly consumed by machines in addition to humans， necessitating compression methods that cater to both human and machine vision requirements. While current learning-based image coding techniques have significantly improved human visual perception， they struggle to balance signal fidelity and semantic fidelity， limiting their ability to meet the needs of both audiences effectively. To address this limitation， this study proposes a task-adaptive， feature-decoupling scalable compression method. This approach supports multiple machine vision tasks using a single bitstream and enables selective or complete image reconstruction depending on specific requirements. The proposed method decouples image features into object and background features， compressing and reconstructing them independently. Reconstructed object features are employed for tasks such as object detection and semantic segmentation， whereas the fully reconstructed image caters to human visual perception. This method enhances compression efficiency by eliminating the need to reconstruct the entire image for visual tasks， thereby meeting the distinct demands of human perception. Furthermore， to address performance imbalances arising from variations in the importance of target regions， a plug-and-play task-adaptive unit is integrated into the target feature decoder. This unit enables task-specific adjustments to improve the analysis performance of reconstructed target images without requiring retraining of the entire network. Experimental results demonstrate that the proposed method outperforms conventional encoders and decoders in task performance while achieving superior Rate-Distortion efficiency. These findings underscore the potential of this method to advance scalable image compression for both human and machine vision applications.
- image compression,
- human-machine collaborative,
- feature decoupling,
- task-adaptive enhancement

FullText(HTML)

References (29)

References

[1]	Moving Picture Experts Group. Draft call for evidence for video coding for machines：w19077［S］. Brussels，Belgium：2020：2- 5.
[2]	Joint Bi-level Image Experts Group，Joint Photographic Experts Group. JPEG AI use cases and requirements：N90021［S］. Online：90th JPEG Meeting，2021：2- 7.
[3]	WALLACE G K. The JPEG still picture compression standard［J］. IEEE Transactions on Consumer Electronics，1992，38（1）：xviii-xxxiv. doi：10.1109/30.125072 doi: 10.1109/30.125072
[4]	李莲，魏石磊. 一种基于VP8编码的WebP图片压缩格式研究［J］. 单片机与嵌入式系统应用，2012，12（3）：40- 43. LI Lian，WEI Shilei. WebP：A new image compression format based on VP8 encoding［J］. Microcontrollers& Embedded Systems，2012，12（3）：40- 43.（in Chinese）
[5]	SULLIVAN G J，OHM J R，HAN W J，et al. Overview of the high efficiency video coding（HEVC）standard［J］. IEEE Transactions on Circuits and Systems for Video Technology，2012，22（12）：1649- 1668. doi：10.1109/tcsvt.2012.2221191 doi: 10.1109/tcsvt.2012.2221191
[6]	BROSS B，WANG Yekui，YE Yan，et al. Overview of the versatile video coding（VVC）standard and its applications［J］. IEEE Transactions on Circuits and Systems for Video Technology，2021，31（10）：3736- 3764. doi：10.1109/tcsvt.2021.3101953 doi: 10.1109/tcsvt.2021.3101953
[7]	LEE S，JEONG J B，RYU E S. Entropy-constrained implicit neural representations for deep image compression［J］. IEEE Signal Processing Letters，2023，30：663- 667. doi：10.1109/lsp.2023.3279780 doi: 10.1109/lsp.2023.3279780
[8]	ZHANG Gai，ZHANG Xinfeng，TANG Lv. Enhanced quantified local implicit neural representation for image compression［J］. IEEE Signal Processing Letters，2023，30：1742- 1746. doi：10.1109/lsp.2023.3334956 doi: 10.1109/lsp.2023.3334956
[9]	CHENG Zhengxue，SUN Heming，TAKEUCHI M，et al. Learned image compression with discretized Gaussian mixture likelihoods and attention modules［C］// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）. Seattle，WA，USA. IEEE，2020：7936- 7945. doi：10.1109/cvpr42600.2020.00796 doi: 10.1109/cvpr42600.2020.00796
[10]	ZOU Renjie，SONG Chunfeng，ZHANG Zhaoxiang. The devil is in the details：window-based attention for image compression［C］// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）. New Orleans，LA，USA. IEEE，2022：17471- 17480. doi：10.1109/cvpr52688.2022.01697 doi: 10.1109/cvpr52688.2022.01697
[11]	CAI Qi，CHEN Zhifeng，WU D O，et al. A novel video coding strategy in HEVC for object detection［J］. IEEE Transactions on Circuits and Systems for Video Technology，2021，31（12）：4924- 4937. doi：10.1109/tcsvt.2021.3056134 doi: 10.1109/tcsvt.2021.3056134
[12]	WANG Qiang，SHEN Liquan，SHI Yuan. Recognition-driven compressed image generation using semantic-prior information［J］. IEEE Signal Processing Letters，2020，27：1150- 1154. doi：10.1109/lsp.2020.3004967 doi: 10.1109/lsp.2020.3004967
[13]	WANG Shurun，WANG Zhao，WANG Shiqi，et al. End-to-end compression towards machine vision：network architecture design and optimization［J］. IEEE Open Journal of Circuits and Systems，2021，2：675- 685. doi：10.1109/ojcas.2021.3126061 doi: 10.1109/ojcas.2021.3126061
[14]	WANG Mengyang，ZHANG Zhicong，LI Jiahui，et al. Deep joint source-channel coding for multi-task network［J］. IEEE Signal Processing Letters，2021，28：1973- 1977. doi：10.1109/lsp.2021.3113827 doi: 10.1109/lsp.2021.3113827
[15]	刘东，王叶斐，林建平，等. 端到端优化的图像压缩技术进展［J］. 计算机科学，2021，48（3）：1- 8. LIU Dong，WANG Yefei，LIN Jianping，et al. Advances in end-to-end optimized image compression technologies［J］. Computer Science，2021，48（3）：1- 8.（in Chinese）
[16]	GAO Changsheng，LIU Dong，LI Li，et al. Towards task-generic image compression：a study of semantics-oriented metrics［J］. IEEE Transactions on Multimedia，2021，25：721- 735.
[17]	YANG Shuai，HU Yueyu，YANG Wenhan，et al. Towards coding for human and machine vision：scalable face image coding［J］. IEEE Transactions on Multimedia，2021，23：2957- 2971. doi：10.1109/tmm.2021.3068580 doi: 10.1109/tmm.2021.3068580
[18]	WANG Shurun，WANG Shiqi，YANG Wenhan，et al. Towards analysis-friendly face representation with scalable feature and texture compression［J］. IEEE Transactions on Multimedia，2022，24：3169- 3181. doi：10.1109/tmm.2021.3094300 doi: 10.1109/tmm.2021.3094300
[19]	FANG Xin，DUAN Yiping，DU Qiyuan，et al. Sketch assisted face image coding for human and machine vision：a joint training approach［J］. IEEE Transactions on Circuits and Systems for Video Technology，2023，33（10）：6086- 6100. doi：10.1109/tcsvt.2023.3262251 doi: 10.1109/tcsvt.2023.3262251
[20]	CHOI H，BAJIC I V. Scalable image coding for humans and machines［J］. IEEE Transactions on Image Processing，2022，31：2739- 2754. doi：10.1109/tip.2022.3160602 doi: 10.1109/tip.2022.3160602
[21]	REDMON J，FARHADI A. YOLO9000：Better，faster，stronger［C］// 2017 IEEE Conference on Computer Vision and Pattern Recognition（CVPR）. Honolulu，HI，USA. IEEE，2017：6517- 6525. doi：10.1109/cvpr.2017.690 doi: 10.1109/cvpr.2017.690
[22]	REN Shaoqing，HE Kaiming，GIRSHICK R，et al. Faster R-CNN：Towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（6）：1137- 1149. doi：10.1109/tpami.2016.2577031 doi: 10.1109/tpami.2016.2577031
[23]	BADRINARAYANAN V，KENDALL A，CIPOLLA R. SegNet：A deep convolutional encoder-decoder architecture for image segmentation［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence，2017，39（12）：2481- 2495. doi：10.1109/tpami.2016.2644615 doi: 10.1109/tpami.2016.2644615
[24]	CHEN L C，ZHU Yukun，PAPANDREOU G，et al. Encoder-decoder with atrous separable convolution for semantic image segmentation［M］// Lecture Notes in Computer Science. Cham：Springer International Publishing，2018：833- 851. doi：10.1007/978-3-030-01234-2_49 doi: 10.1007/978-3-030-01234-2_49
[25]	ZHANG Xiaoning，WANG Tiantian，QI Jinqing，et al. Progressive attention guided recurrent network for salient object detection［C］// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City，UT，USA. IEEE，2018：714- 722. doi：10.1109/cvpr.2018.00081 doi: 10.1109/cvpr.2018.00081
[26]	LIU Wei，ANGUELOV D，ERHAN D，et al. SSD：single shot MultiBox detector［C］// Lecture Notes in Computer Science. Cham：Springer International Publishing，2016：21- 37. doi：10.1007/978-3-319-46448-0_2 doi: 10.1007/978-3-319-46448-0_2
[27]	BALLÉ J，MINNEN D，SINGH S，et al. Variational image compression with a scale hyperprior［C］// 6th International Conference on Learning Representations（ICRL），2018：40- 48.
[28]	MINNEN D，BALLÉ J，TODERICI G. Joint autoregressive and hierarchical priors for learned image compression［C］// Advances in Neural Information Processing Systems，2018：10771- 10780.
[29]	FOROUTAN Y，HARELL A，ANDRADE A，et al. Base layer efficiency in scalable human-machine coding［C］// 2023 IEEE International Conference on Image Processing（ICIP）. Kuala Lumpur，Malaysia. IEEE，2023：3299- 3303. doi：10.1109/icip49359.2023.10223087 doi: 10.1109/icip49359.2023.10223087

[1]	TIAN Zhixin, JIANG Qiuping. Quality-Aware Domain Adaptation for Underwater Image Enhancement Quality Assessment[J]. JOURNAL OF SIGNAL PROCESSING, 2025, 41(2): 290-301. DOI: 10.12466/xhcl.2025.02.008
[2]	ZHANG Jiabo, TANG Shangsong, HE Ajuan. Abnormal Behavior Detection Based on Time-Channel Topology Decoupling Graph Convolution[J]. JOURNAL OF SIGNAL PROCESSING, 2024, 40(12): 2193-2205. DOI: 10.12466/xhcl.2024.12.008
[3]	ZHANG Tianqi, LUO Qingyu, FANG Rong, ZHANG Huizhi. Single-channel Speech Enhancement Method Based on Hierarchical Refinement and Residual Feature Aggregation Network[J]. JOURNAL OF SIGNAL PROCESSING, 2023, 39(7): 1285-1298. DOI: 10.16798/j.issn.1003-0530.2023.07.015
[4]	ZHENG Zhe, LEI Lin, SUN Hao, KUANG Gangyao. High precision object detection in remote sensing images by combining feature enhancement and anchor automatic generation[J]. JOURNAL OF SIGNAL PROCESSING, 2021, 37(9): 1669-1680. DOI: 10.16798/j.issn.1003-0530.2021.09.011
[5]	XU Fangfang, HE Peiyu, PAN Fan, XIA Xiuyu. New variable step size LMS adaptive gain control method for speech processing[J]. JOURNAL OF SIGNAL PROCESSING, 2021, 37(3): 456-462. DOI: 10.16798/j.issn.1003-0530.2021.03.016
[6]	Bai Zhigang, Bao Changchun. Online Update of Noise Basis Matrix for the NMF-based Speech Enhancement[J]. JOURNAL OF SIGNAL PROCESSING, 2020, 36(6): 831-838. DOI: 10.16798/j.issn.1003-0530.2020.06.004
[7]	HE Xue-yun, TANG Ke-xiang, LIANG Yan. Enhanced Adaptive Stagewise Orthogonal Matching Pursuit Algorithm Based on Compressed Sensing[J]. JOURNAL OF SIGNAL PROCESSING, 2018, 34(9): 1045-1052. DOI: 10.16798/j.issn.1003-0530.2018.09.004
[8]	LI Xiang-ping, LU Zhi-yi, CHEN Qi, ZOU Xiao-hai. The Decoupling Algorithm Based on Adaptive Kalman Filter[J]. JOURNAL OF SIGNAL PROCESSING, 2018, 34(9): 1026-1032. DOI: 10.16798/j.issn.1003-0530.2018.09.002
[9]	LI Qing, BO Hua. Research on Multi-domain Features Selection Algorithm of EEG Signals in Color Sensing[J]. JOURNAL OF SIGNAL PROCESSING, 2018, 34(8): 991-997. DOI: 10.16798/j.issn.1003-0530.2018.08.012
[10]	ZHOU Xuan, BAO Chang-Chun, XIA Bing-Yin, LIANG Yan, HE Yu-Wen. A Wideband Speech Enhancement Method Based on Adaptive Noise Estimation[J]. JOURNAL OF SIGNAL PROCESSING, 2011, 27(9): 1313-1318.

Cited By

Get Citation

PDF

XML

Article Metrics

Article views (44) PDF downloads (18)

Scalable Image Coding via Feature Decoupling for Human and Machine with Task-Adaptive Enhancement

Abstract

References

Related Articles

Catalog

Article Metrics

Related

Scalable Image Coding via Feature Decoupling for Human and Machine with Task-Adaptive Enhancement

Abstract

References

Related Articles

Catalog

Article Metrics

Related

Export File

Citation

Format

Content