基于二分匹配<b>Transformer</b>的<b>SAR</b>图像检测

龙伟军; 郭宇轩; 徐艺卓; 杜川

doi:10.12466/xhcl.2024.09.007

摘要: 合成孔径雷达（Synthetic Aperture Radar，SAR）拥有全天候、全天时的成像能力，对SAR图像的目标检测具有重大军事和民用意义。在SAR图像目标检测中，由于成像时的复杂背景和非检测目标的干扰，存在重复检测的问题。传统的用于SAR图像检测的深度学习网络通过增加特征提取网络、非极大值抑制等处理降低重复检测的概率，当阈值设置不当和待检测目标存在重叠时仍会导致虚警和漏检的发生。为此，本文引入一种基于二分匹配损失的Transformer目标检测模型，与传统的SAR图像检测网络相比，二分匹配通过匈牙利算法将预测框与候选框进行一对一的匹配，从而找出最佳的匹配对，避免同一目标的重复检测。匹配时会自动忽略多余候选框，自动将其归类为背景，该方法不仅消除了重复检测导致的虚警问题，还省略了非极大值抑制的操作。同时，匹配结果可以直接作用于模型的输出，实现端到端的检测优化，将目标检测任务转化为集合预测问题，通过一组固定的可学习位置编码，有效地建立目标与图像特征之间的关联，无需依赖先验知识或预处理步骤，相较传统方法极大的简化了训练和部署流程。为了评估模型的有效性和可靠性，本文与当前热门目标检测模型在SAR-AIRcraft-1.0数据集上进行了对比，在保证较高召回率的情况下实现了不错的检测准确性，展示了模型的优越性能。

Abstract: ‍ ‍Synthetic aperture radar （SAR） possesses all-weather and all-day imaging capabilities， which are highly important for SAR image target detection for military and civilian applications. In SAR image target detection， repeated detection occurs due to complex backgrounds during imaging and interference from non-target objects. Traditional deep-learning networks used for SAR image detection reduce the probability of repeated detection by increasing feature extraction networks， non-maximum suppression， and other processes. However， improper threshold settings and overlapping detection targets can still lead to false alarms and missed detections. To address this， this paper introduces a transformer-based target detection model with binary matching loss. Compared to traditional SAR image detection networks， binary matching utilizes the Hungarian algorithm to perform one-to-one matching between predicted boxes and candidate boxes， thereby identifying the best matching pairs and avoiding repeated detection of the same target. During matching， redundant candidate boxes are automatically ignored and classified as background， eliminating false alarms caused by repeated detection and omitting the need for non-maximum suppression operations. Moreover， the matching results can directly affect the model’s output， achieving end-to-end detection optimization and transforming the target detection task into a set prediction problem. Through a fixed set of learnable position encodings， effective associations between targets and image features are established without relying on prior knowledge or preprocessing steps， greatly simplifying the training and deployment processes compared to traditional methods. To evaluate the effectiveness and reliability of the model， comparisons were made with current state-of-the-art target detection models on the SAR-AIRcraft-1.0 dataset， achieving good detection accuracy， while ensuring a high recall rate， demonstrating the superior performance of the model.

基于二分匹配Transformer的SAR图像检测

SAR Image Detection Based on the Bipartite Matching Transformer