非对称图像检索研究综述

谢懿; 王子文; 朱建清

doi:10.12466/xhcl.2026.04.012

摘要: 随着深度神经网络表征能力的不断增强，基于内容的图像检索在检索精度方面取得了显著提升。然而，模型规模的扩张与计算复杂度的攀升，使得传统对称式检索架构在大规模数据与资源受限场景下的部署与应用面临严峻挑战。为在保证检索性能的同时降低计算与通信开销，非对称图像检索通过在查询端与检索库端采用不同复杂度和输入分辨率的模型结构，逐渐成为兼顾性能与效率的重要研究方向。然而，模型容量与输入尺度不匹配，往往会导致不同网络的嵌入空间产生偏移，从而在一定程度上降低匹配精度与稳定性。围绕上述核心问题，本文系统综述了近年来非对称图像检索领域的代表性研究工作，并从是否依赖知识蒸馏的角度，将现有方法划分为基于知识蒸馏与非基于知识蒸馏的两大类。对于基于知识蒸馏的方法，本文从单检索库嵌入空间蒸馏与融合嵌入空间蒸馏两个层面进行了梳理与分析：前者侧重于设计蒸馏策略以提升查询网络与检索库网络之间的嵌入对齐能力，后者则侧重于通过多源嵌入空间融合策略构建高性能的检索库嵌入表示。对于非知识蒸馏方法，则重点总结了网络向后兼容、神经架构搜索与网络剪枝等技术路线在跨网络特征兼容性建模中的设计思想与工程特性。最后，本文展望了多尺度嵌入兼容、结构化剪枝兼容、端云协同量化对齐以及面向动态场景的自适应检索策略等未来研究方向，为相关研究与实际系统设计提供参考。

Abstract: As deep neural networks continue to improve in terms of representational power， the accuracy of content-based image retrieval （CBIR） has increased significantly. However， the increasing model size and computational complexity have made deploying and applying traditional symmetric retrieval architectures at scale and in resource-constrained settings difficult. For reduced computational and communication overhead while preserving retrieval performance， balancing the efficiency and accuracy of asymmetric image retrieval—which employs models of different complexities and input resolutions at the query and gallery sides—has emerged as an important research topic. Nevertheless， mismatches in model capacity and input scale often induce shifts between the embedding spaces of different networks， thereby degrading matching accuracy and robustness. This paper presents a systematic review of the representative studies in asymmetric image retrieval aimed at addressing these challenges and categorizes existing methods into knowledge-distillation-based and non-knowledge-distillation-based ones. For knowledge-distillation-based methods， we analyze previous studies from two perspectives： single-gallery embedding space distillation and fusion embedding space distillation. The former is focused on designing distillation strategies to improve embedding alignment between query and gallery networks， while the latter is focused on constructing high-quality gallery embeddings by multi-source embedding space fusion. For non-knowledge-distillation approaches， we focus on the design principles and engineering characteristics of backward-compatible networks， neural architecture search， and network pruning for modeling cross-network feature compatibility. Finally， this paper discusses possible future research directions， including multi-scale embedding compatibility， structured pruning for asymmetric retrieval networks， embedding alignment between quantized and non-quantized models under edge-cloud collaboration， and adaptive retrieval strategies for dynamic scenarios， to provide guidance for future research and practical system design.

非对称图像检索研究综述

Asymmetric Image Retrieval： A Survey