TY - JOUR
T1 - Visual Prompt Selection Framework for Real-Time Object Detection and Interactive Segmentation in Augmented Reality Applications
AU - Song, Eungyeol
AU - Oh, Doeun
AU - Oh, Beom Seok
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/11
Y1 - 2024/11
N2 - This study presents a novel visual prompt selection framework for augmented reality (AR) applications that integrates advanced object detection and image segmentation techniques. The framework is designed to enhance user interactions and improve the accuracy of foreground–background separation in AR environments, making AR experiences more immersive and precise. We evaluated six state-of-the-art object detectors (DETR, DINO, CoDETR, YOLOv5, YOLOv8, and YOLO-NAS) in combination with a prompt segmentation model using the DAVIS 2017 validation dataset. The results show that the combination of YOLO-NAS-L and SAM achieved the best performance with a J&F score of 70%, while DINO-scale4-swin had the lowest score of 57.5%. This 12.5% performance gap highlights the significant contribution of user-provided regions of interest (ROIs) to segmentation outcomes, emphasizing the importance of interactive user input in enhancing accuracy. Our framework supports fast prompt processing and accurate mask generation, allowing users to refine digital overlays interactively, thereby improving both the quality of AR experiences and overall user satisfaction. Additionally, the framework enables the automatic detection of moving objects, providing a more efficient alternative to traditional manual selection interfaces in AR devices. This capability is particularly valuable in dynamic AR scenarios, where seamless user interaction is crucial.
AB - This study presents a novel visual prompt selection framework for augmented reality (AR) applications that integrates advanced object detection and image segmentation techniques. The framework is designed to enhance user interactions and improve the accuracy of foreground–background separation in AR environments, making AR experiences more immersive and precise. We evaluated six state-of-the-art object detectors (DETR, DINO, CoDETR, YOLOv5, YOLOv8, and YOLO-NAS) in combination with a prompt segmentation model using the DAVIS 2017 validation dataset. The results show that the combination of YOLO-NAS-L and SAM achieved the best performance with a J&F score of 70%, while DINO-scale4-swin had the lowest score of 57.5%. This 12.5% performance gap highlights the significant contribution of user-provided regions of interest (ROIs) to segmentation outcomes, emphasizing the importance of interactive user input in enhancing accuracy. Our framework supports fast prompt processing and accurate mask generation, allowing users to refine digital overlays interactively, thereby improving both the quality of AR experiences and overall user satisfaction. Additionally, the framework enables the automatic detection of moving objects, providing a more efficient alternative to traditional manual selection interfaces in AR devices. This capability is particularly valuable in dynamic AR scenarios, where seamless user interaction is crucial.
KW - augmented reality
KW - image segmentation
KW - object detection
KW - user-interactive system
UR - http://www.scopus.com/inward/record.url?scp=85210422124&partnerID=8YFLogxK
U2 - 10.3390/app142210502
DO - 10.3390/app142210502
M3 - Article
AN - SCOPUS:85210422124
SN - 2076-3417
VL - 14
JO - Applied Sciences (Switzerland)
JF - Applied Sciences (Switzerland)
IS - 22
M1 - 10502
ER -