Multimodal Feature Fusion for Staring Radar Target Recognition
-
Abstract
Holographic staring radar enables effective detection and reliable recognition of low-slow-small (LSS) targets owing to its unique transmit-receive beam design and long-term coherent integration. Given that staring radar provides both Doppler information and target trajectory information for recognition tasks, efficiently extracting and fusing these two modalities has become essential to improve recognition performance. In this study, we propose a multimodal feature fusion method based on graph neural networks (GNNs). For Doppler feature extraction, we employ the LS-ResNet architecture to obtain multi-scale Doppler representations at four different levels. This model incorporates large-kernel perception and dynamic small-kernel focusing. For trajectory feature extraction, a GRU-based network is adopted to model the temporal features of the trajectory sequence, and an attention mechanism is introduced to highlight key frames that contain critical motion information. In the subsequent fusion stage, Doppler and trajectory features produced by the above modules are organized into a graph structure, and a GNN is used to model the complex relationships among feature nodes to fully exploit the complementary information between the two modalities and achieve effective multimodal fusion. To evaluate the proposed approach, we designed multiple backbone networks for comparative experiments based on real-world Doppler and trajectory data of LSS targets collected by holographic staring radar. The experimental results show that the LS-ResNet model achieved the best accuracy of 96.16% with Doppler data alone. In contrast, the GRU-A network attained an accuracy of 93.97% with only trajectory data, which notably outperformed other traditional sequence classification models. The results of experiments with multimodal fusion show that the proposed GNN-based fusion strategy exhibited an improved accuracy of 98.12%, which represents gains of 1.96% over the Doppler-only modality and 4.15% over the trajectory-only modality. The results of ablation experiments further validate the importance of each feature node. In addition, the confusion matrix and feature visualization results demonstrate that the fused features significantly reduced misclassification among easily confused target classes compared with single-modality approaches.
-
-