TY - JOUR
T1 - Development and validation of deep learning-based automatic detection algorithm for malignant pulmonary nodules on chest radiographs
AU - Nam, Ju Gang
AU - Park, Sunggyun
AU - Hwang, Eui Jin
AU - Lee, Jong Hyuk
AU - Jin, Kwang Nam
AU - Lim, Kun Young
AU - Vu, Thienkai Huy
AU - Sohn, Jae Ho
AU - Hwang, Sangheum
AU - Goo, Jin Mo
AU - Park, Chang Min
N1 - Publisher Copyright:
© RSNA, 2018.
PY - 2019/1
Y1 - 2019/1
N2 - Purpose: To develop and validate a deep learning-based automatic detection algorithm (DLAD) for malignant pulmonary nodules on chest radiographs and to compare its performance with physicians including thoracic radiologists. Materials and Methods: For this retrospective study, DLAD was developed by using 43 292 chest radiographs (normal radiograph- to-nodule radiograph ratio, 34 067:9225) in 34 676 patients (healthy-to-nodule ratio, 30 784:3892; 19 230 men [mean age, 52.8 years; age range, 18-99 years]; 15 446 women [mean age, 52.3 years; age range, 18-98 years]) obtained between 2010 and 2015, which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph classification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared. Results: According to one internal and four external validation data sets, radiograph classification and nodule detection performances of DLAD were a range of 0.92-0.99 (AUROC) and 0.831-0.924 (JAFROC FOM), respectively. DLAD showed a higher AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P < .05), and all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range, 0.006-0.190; P < .05). Conclusion: This deep learning-based automatic detection algorithm outperformed physicians in radiograph classification and nodule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians' performances when used as a second reader.
AB - Purpose: To develop and validate a deep learning-based automatic detection algorithm (DLAD) for malignant pulmonary nodules on chest radiographs and to compare its performance with physicians including thoracic radiologists. Materials and Methods: For this retrospective study, DLAD was developed by using 43 292 chest radiographs (normal radiograph- to-nodule radiograph ratio, 34 067:9225) in 34 676 patients (healthy-to-nodule ratio, 30 784:3892; 19 230 men [mean age, 52.8 years; age range, 18-99 years]; 15 446 women [mean age, 52.3 years; age range, 18-98 years]) obtained between 2010 and 2015, which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph classification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared. Results: According to one internal and four external validation data sets, radiograph classification and nodule detection performances of DLAD were a range of 0.92-0.99 (AUROC) and 0.831-0.924 (JAFROC FOM), respectively. DLAD showed a higher AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P < .05), and all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range, 0.006-0.190; P < .05). Conclusion: This deep learning-based automatic detection algorithm outperformed physicians in radiograph classification and nodule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians' performances when used as a second reader.
UR - http://www.scopus.com/inward/record.url?scp=85057579798&partnerID=8YFLogxK
U2 - 10.1148/radiol.2018180237
DO - 10.1148/radiol.2018180237
M3 - Article
C2 - 30251934
AN - SCOPUS:85057579798
SN - 0033-8419
VL - 290
SP - 218
EP - 228
JO - Radiology
JF - Radiology
IS - 1
ER -