Asymmetric Image Retrieval: A Survey
-
Abstract
As deep neural networks continue to improve in terms of representational power, the accuracy of content-based image retrieval (CBIR) has increased significantly. However, the increasing model size and computational complexity have made deploying and applying traditional symmetric retrieval architectures at scale and in resource-constrained settings difficult. For reduced computational and communication overhead while preserving retrieval performance, balancing the efficiency and accuracy of asymmetric image retrieval—which employs models of different complexities and input resolutions at the query and gallery sides—has emerged as an important research topic. Nevertheless, mismatches in model capacity and input scale often induce shifts between the embedding spaces of different networks, thereby degrading matching accuracy and robustness. This paper presents a systematic review of the representative studies in asymmetric image retrieval aimed at addressing these challenges and categorizes existing methods into knowledge-distillation-based and non-knowledge-distillation-based ones. For knowledge-distillation-based methods, we analyze previous studies from two perspectives: single-gallery embedding space distillation and fusion embedding space distillation. The former is focused on designing distillation strategies to improve embedding alignment between query and gallery networks, while the latter is focused on constructing high-quality gallery embeddings by multi-source embedding space fusion. For non-knowledge-distillation approaches, we focus on the design principles and engineering characteristics of backward-compatible networks, neural architecture search, and network pruning for modeling cross-network feature compatibility. Finally, this paper discusses possible future research directions, including multi-scale embedding compatibility, structured pruning for asymmetric retrieval networks, embedding alignment between quantized and non-quantized models under edge-cloud collaboration, and adaptive retrieval strategies for dynamic scenarios, to provide guidance for future research and practical system design.
-
-