Research on Feature Association and Fusion Small Target Detection Algorithm Based on Transformer
-
Graphical Abstract
-
Abstract
Digital technology is widely used in the military field owing to the advancement of the information age. Target detection is a core function of the weapon system and an important factor affecting war outcomes, as it is crucial for reconnaissance, early warning, and surveillance. However, target detection has four primary challenges: small target detection, which is the key and most difficult challenge; small sample detection; real-time detection; and occlusion target detection. As small targets generally occupy a few dozen or even a few pixels, constructing appropriate feature extraction models and obtaining accurate detection results based on a priori knowledge is challenging for traditional detection algorithms. Deep learning detection algorithms are prone to losing feature information during feature extraction and easily confuse target features with background noise in complex and changing application scenarios. In addition, the current small target detection algorithms have issues, such as insufficient utilization of small target semantic features, and small target spatial features are not prominent. Consequently, the detection accuracy of these algorithms is low, and several missing and false detection phenomena are observed. This study addresses this issue by proposing a small target detection algorithm based on the multi-scale local convolutional feature association (MLCFA) mechanism. The MLCFA mechanism primarily comprises the local convolutional attention association (LCAA) and cross-attention feature reconstruction (CAFR) modules. The LCAA module extracts the feature association from the multi-scale feature map obtained by the feature fusion network, strengthens the connection between the pixels inside the small target, and highlights the unity of the spatial features of the small target while suppressing background noise to improve detection robustness under a complex background. The CAFR module obtains 100 query vectors via the self-attention mechanism, combines the associated feature sequence obtained by LCAA to carry out global feature reconstruction, and obtains target detection information through the fully connected network, which resolves the issues of small target boundary frame disturbance and missing features to some extent. The comparison experiment on the TinyPerson dataset shows that, compared with RetinaNet and other algorithms, the network model equipped with MLCFA increases the F1 score of the detection of two types of targets by 19.81% and 11.88%, respectively, greatly improving the detection performance of small targets, proving the effectiveness of the MLCFA module. In addition, the convergence rate experiment shows that MLAFC only needs 50 epochs to have good detection performance, indicating that it has fast model inference and some model migration ability.
-
-