Visual Prompt Selection Framework for Real-Time Object Detection and Interactive Segmentation in Augmented Reality Applications

Eungyeol Song, Doeun Oh, Beom Seok Oh

Research output: Contribution to journalArticlepeer-review

Abstract

This study presents a novel visual prompt selection framework for augmented reality (AR) applications that integrates advanced object detection and image segmentation techniques. The framework is designed to enhance user interactions and improve the accuracy of foreground–background separation in AR environments, making AR experiences more immersive and precise. We evaluated six state-of-the-art object detectors (DETR, DINO, CoDETR, YOLOv5, YOLOv8, and YOLO-NAS) in combination with a prompt segmentation model using the DAVIS 2017 validation dataset. The results show that the combination of YOLO-NAS-L and SAM achieved the best performance with a J&F score of 70%, while DINO-scale4-swin had the lowest score of 57.5%. This 12.5% performance gap highlights the significant contribution of user-provided regions of interest (ROIs) to segmentation outcomes, emphasizing the importance of interactive user input in enhancing accuracy. Our framework supports fast prompt processing and accurate mask generation, allowing users to refine digital overlays interactively, thereby improving both the quality of AR experiences and overall user satisfaction. Additionally, the framework enables the automatic detection of moving objects, providing a more efficient alternative to traditional manual selection interfaces in AR devices. This capability is particularly valuable in dynamic AR scenarios, where seamless user interaction is crucial.

Original languageEnglish
Article number10502
JournalApplied Sciences (Switzerland)
Volume14
Issue number22
DOIs
StatePublished - Nov 2024

Keywords

  • augmented reality
  • image segmentation
  • object detection
  • user-interactive system

Fingerprint

Dive into the research topics of 'Visual Prompt Selection Framework for Real-Time Object Detection and Interactive Segmentation in Augmented Reality Applications'. Together they form a unique fingerprint.

Cite this