Improved DETR With Class Tokens in an Encoder

Research output: Contribution to journalArticlepeer-review

Abstract

DETR first used a transformer in object detection. It does not use anchor boxes and non-maximum suppression by converting object detection into a set prediction problem. DETR has shown competitive results on public datasets and brought many new ideas on object detection. Most DETR-like methods focus on improving decoder and object queries in the decoder part. We conclude that the backbone and the encoder comprising the DETR and DETR-like models serve as feature extractors through prior research. Through an analysis of the outputs from the backbone and the encoder, we notice that they extract image features for object detection. Based on this fact, we want to reinforce the feature extraction stage by introducing class tokens in the encoder. We add a class tokens module that represents prior category information in the encoder. It enables the utilization of global attention among feature tokens. This provides prior knowledge in feature extraction. We investigate two initialization methods in the proposed class token module: random initialization and pretrained class tokens. Also, the proposed module can be used as a plug-and-play component in DETR-like models. Experimental results show that the proposed module performs better than each baseline model.

Original languageEnglish
Pages (from-to)129498-129510
Number of pages13
JournalIEEE Access
Volume12
DOIs
StatePublished - 2024

Keywords

  • DETR
  • Object detection
  • class token
  • encoder
  • transformer

Fingerprint

Dive into the research topics of 'Improved DETR With Class Tokens in an Encoder'. Together they form a unique fingerprint.

Cite this