Two-Stage Self-Supervised Algorithm for Generalized Few-Shot Object Detection
-
Graphical Abstract
-
Abstract
Deep learning technology has made significant advancements in object detection, largely due to the availability of large-scale, accurately annotated datasets. However, in specific fields such as national defense, maritime security, and medicine, obtaining annotated data can be particularly challenging due to the scarcity of samples. As a result, few-shot object detection, which aims to develop algorithms that can extract knowledge from very limited samples while achieving efficient object detection, has garnered substantial attention in the academic community for its potential to address sample sparsity. One of the main challenges in few-shot object detection is the significant distribution discrepancy between novel and base classes, primarily caused by the limited availability of novel class samples. This discrepancy constrains the accuracy of detection tasks. Additionally, during the fine-tuning process for novel classes, the non-overlapping nature of novel and base classes often leads to drastic gradient updates. Consequently, as the model learns the characteristics of novel classes, it may forget the feature knowledge of base classes, resulting in a decline in overall performance. To tackle the issue of scarce samples for novel classes, this study employs a self-supervised learning strategy. Self-supervised learning does not depend on annotated information and allows for the creation of proxy tasks that facilitate model training, effectively addressing the challenge of sample scarcity in few-shot object detection. To mitigate the problem of catastrophic forgetting base class knowledge after acquiring novel class features, this paper integrates self-supervised learning with a two-stage object detector. By utilizing latent features in the category domain to represent the characteristics of each class and implementing dynamic updating strategies to further refine features during the learning process of new classes, the precision of the regression box is enhanced. This is achieved through the construction of well-designed proxy tasks in the bounding box domain. Extensive experimental validation on the PASCAL VOC and MS COCO datasets demonstrated that the proposed method outperforms various other few-shot object detection models, both in terms of novel class performance and overall efficacy.
-
-