Improved Object Detection with Content and Position Separation in Transformer

Yao Wang, Jong Eun Ha

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

In object detection, Transformer-based models such as DETR have exhibited state-of-the-art performance, capitalizing on the attention mechanism to handle spatial relations and feature dependencies. One inherent challenge these models face is the intertwined handling of content and positional data within their attention spans, potentially blurring the specificity of the information retrieval process. We consider object detection as a comprehensive task, and simultaneously merging content and positional information like before can exacerbate task complexity. This paper presents the Multi-Task Fusion Detector (MTFD), a novel architecture that innovatively dissects the detection process into distinct tasks, addressing content and position through separate decoders. By utilizing assumed fake queries, the MTFD framework enables each decoder to operate under a presumption of known ancillary information, ensuring more specific and enriched interactions with the feature map. Experimental results affirm that this methodical separation followed by a deliberate fusion not only simplifies the task difficulty of the detection process but also augments accuracy and clarifies the details of each component, providing a fresh perspective on object detection in Transformer-based architectures.

Original languageEnglish
Article number353
JournalRemote Sensing
Volume16
Issue number2
DOIs
StatePublished - Jan 2024

Keywords

  • decoder
  • DETR
  • object detection
  • Transformer

Fingerprint

Dive into the research topics of 'Improved Object Detection with Content and Position Separation in Transformer'. Together they form a unique fingerprint.

Cite this