‍ZHOU Quan,NI Yinghao,MO Yuwei,et al. FMA-DETR: a Transformer object detection method without encoder[J]. Journal of Signal Processing, 2024,40(6): 1160-1170. DOI: 10.16798/j.issn.1003-0530.2024.06.015.
Citation: ‍ZHOU Quan,NI Yinghao,MO Yuwei,et al. FMA-DETR: a Transformer object detection method without encoder[J]. Journal of Signal Processing, 2024,40(6): 1160-1170. DOI: 10.16798/j.issn.1003-0530.2024.06.015.

FMA-DETR: A Transformer Object Detection Method Without Encoder

  • ‍ ‍DETR is the first visual model to apply a Transformer to object detection. In the DETR structure, the Transformer encoder recodes highly encoded image features, which to some extent leads to duplication of network functionality. Furthermore, the Transformer encoder’s multi-layered deep stack and extensive parameter count complicate network optimization and slow down model convergence. This study designs a Transformer object detection network model without an encoder. Due to the elimination of the need to introduce a Transformer encoder, the network model proposed in this paper has fewer parameters, lower computational complexity, and faster convergence speed than DETR. However, directly removing the Transformer encoder will reduce the network’s expressive power, causing the Transformer decoder to fail to focus on image features containing an object from numerous image features, resulting in a significant decrease in detection performance. To alleviate this problem, this paper proposes a fusion-feature mixing attention (FMA) mechanism that compensates for the decrease in feature expression ability of the detection network through adaptive feature mixing and channel cross-attention. Applying it to the Transformer decoder can alleviate the performance degradation caused by removing the Transformer encoder. On the MS-COCO dataset, the network model proposed (called FMA-DETR) in this paper achieves similar performance to DETR, while having faster convergence speed, a lower parameter count, and a smaller computational complexity. Additionally, numerous ablation experiments were conducted in this study to verify the effectiveness of the proposed method.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return