Pilot Evaluation of a Deep Learning Model for Nasogastric Tube Verification on Chest Radiographs: A Single-Center Retrospective Study

  • Sang Won Park
  • , Doohee Lee
  • , Jae Eun Song
  • , Yoon Kim
  • , Hyun Soo Choi
  • , Seung Joon Lee
  • , Woo Jin Kim
  • , Kyoung Min Moon
  • , Oh Beom Kwon

Research output: Contribution to journalArticlepeer-review

Abstract

Background: Accurate confirmation of nasogastric (NG) tubes is essential for patient safety, but delays and variability in interpretation remain common in clinical practice. Deep learning (DL) models have shown potential for assisting in this task, but real-world performance, particularly in detecting malpositioned tubes, remains insufficiently characterized. Methods: We conducted a pilot evaluation of a previously developed DL model using 135 chest radiographs from Kangwon National University Hospital. Expert physicians established the reference standard. Model performance was assessed and receiver operating characteristic (ROC) curve and precision recall curve (PRC) analyses were performed. Differences between correctly classified and misclassified cases were examined using Wilcoxon rank-sum and Fisher’s exact tests to explore potential clinical or radiographic contributors to model failure. Results: The model correctly classified 129 of 135 cases. The sensitivity was 96.1% (95% confidence interval (CI): 92.2–98.9%), specificity was 85.7% (95% CI: 42.2–97.7%), positive predictive value (PPV) was 99.2% (95% CI: 96.1–99.9%), negative predictive value (NPV) was 54.5% (95% CI: 25.4–80.8%), balanced accuracy was 90.8%, and F1-score was 0.976. The area under the ROC curve was 0.970 (95% CI: 0.929–1.000) and that under the PRC was 0.727 (95% CI: 0.289–1.000), reflecting substantial uncertainty related to the very small number of incomplete cases (n = 6). No statistically significant differences in clinical or radiographic characteristics were observed between correctly classified and misclassified cases. Conclusions: The DL model performed well in identifying correctly positioned NG tubes but demonstrated limited and unstable performance for detecting incomplete placements. Given the safety implications of misclassification, the model should be used only as an assistive tool with mandatory physician oversight. Larger, multi-center studies with greater representation of incomplete cases are required to obtain more reliable estimates and support safe clinical implementation.

Original languageEnglish
Article number140
JournalTomography
Volume11
Issue number12
DOIs
StatePublished - Dec 2025

Keywords

  • deep learning model
  • nasogastric tube
  • real-world validation

Fingerprint

Dive into the research topics of 'Pilot Evaluation of a Deep Learning Model for Nasogastric Tube Verification on Chest Radiographs: A Single-Center Retrospective Study'. Together they form a unique fingerprint.

Cite this