TY - JOUR
T1 - Early Prediction of Mortality for Septic Patients Visiting Emergency Room Based on Explainable Machine Learning
T2 - A Real-World Multicenter Study
AU - Park, Sang Won
AU - Yeo, Na Young
AU - Kang, Seonguk
AU - Ha, Taejun
AU - Kim, Tae Hoon
AU - Lee, Doo Hee
AU - Kim, Dowon
AU - Choi, Seheon
AU - Kim, Minkyu
AU - Lee, Dong Hoon
AU - Kim, Do Hyeon
AU - Kim, Woo Jin
AU - Lee, Seung Joon
AU - Heo, Yeon Jeong
AU - Moon, Da Hye
AU - Han, Seon Sook
AU - Kim, Yoon
AU - Choi, Hyun Soo
AU - Oh, Dong Kyu
AU - Lee, Su Yeon
AU - Park, Mi Hyeon
AU - Lim, Chae Man
AU - Heo, Jeongwon
N1 - Publisher Copyright:
© 2024 The Korean Academy of Medical Sciences. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. All Rights Reserved.
PY - 2024
Y1 - 2024
N2 - Background: Worldwide, sepsis is the leading cause of death in hospitals. If mortality rates in patients with sepsis can be predicted early, medical resources can be allocated efficiently. We constructed machine learning (ML) models to predict the mortality of patients with sepsis in a hospital emergency department. Methods: This study prospectively collected nationwide data from an ongoing multicenter cohort of patients with sepsis identified in the emergency department. Patients were enrolled from 19 hospitals between September 2019 and December 2020. For acquired data from 3,657 survivors and 1,455 deaths, six ML models (logistic regression, support vector machine, random forest, extreme gradient boosting [XGBoost], light gradient boosting machine, and categorical boosting [CatBoost]) were constructed using fivefold cross-validation to predict mortality. Through these models, 44 clinical variables measured on the day of admission were compared with six sequential organ failure assessment (SOFA) components (PaO2/FIO2 [PF], platelets (PLT), bilirubin, cardiovascular, Glasgow Coma Scale score, and creatinine). The confidence interval (CI) was obtained by performing 10,000 repeated measurements via random sampling of the test dataset. All results were explained and interpreted using Shapley’s additive explanations (SHAP). Results: Of the 5,112 participants, CatBoost exhibited the highest area under the curve (AUC) of 0.800 (95% CI, 0.756-0.840) using clinical variables. Using the SOFA components for the same patient, XGBoost exhibited the highest AUC of0.678 (95% CI, 0.626-0.730). As interpreted by SHAP, albumin, lactate, blood urea nitrogen, and international normalization ratio were determined to significantly affect the results. Additionally, PF and PLTs in the SOFA component significantly influenced the prediction results. Conclusion: Newly established ML-based models achieved good prediction of mortality in patients with sepsis. Using several clinical variables acquired at the baseline can provide more accurate results for early predictions than using SOFA components. Additionally, the impact of each variable was identified.
AB - Background: Worldwide, sepsis is the leading cause of death in hospitals. If mortality rates in patients with sepsis can be predicted early, medical resources can be allocated efficiently. We constructed machine learning (ML) models to predict the mortality of patients with sepsis in a hospital emergency department. Methods: This study prospectively collected nationwide data from an ongoing multicenter cohort of patients with sepsis identified in the emergency department. Patients were enrolled from 19 hospitals between September 2019 and December 2020. For acquired data from 3,657 survivors and 1,455 deaths, six ML models (logistic regression, support vector machine, random forest, extreme gradient boosting [XGBoost], light gradient boosting machine, and categorical boosting [CatBoost]) were constructed using fivefold cross-validation to predict mortality. Through these models, 44 clinical variables measured on the day of admission were compared with six sequential organ failure assessment (SOFA) components (PaO2/FIO2 [PF], platelets (PLT), bilirubin, cardiovascular, Glasgow Coma Scale score, and creatinine). The confidence interval (CI) was obtained by performing 10,000 repeated measurements via random sampling of the test dataset. All results were explained and interpreted using Shapley’s additive explanations (SHAP). Results: Of the 5,112 participants, CatBoost exhibited the highest area under the curve (AUC) of 0.800 (95% CI, 0.756-0.840) using clinical variables. Using the SOFA components for the same patient, XGBoost exhibited the highest AUC of0.678 (95% CI, 0.626-0.730). As interpreted by SHAP, albumin, lactate, blood urea nitrogen, and international normalization ratio were determined to significantly affect the results. Additionally, PF and PLTs in the SOFA component significantly influenced the prediction results. Conclusion: Newly established ML-based models achieved good prediction of mortality in patients with sepsis. Using several clinical variables acquired at the baseline can provide more accurate results for early predictions than using SOFA components. Additionally, the impact of each variable was identified.
KW - Clinical Decision Support System (CDSS)
KW - Explainable Artificial Intelligence (XAI)
KW - Machine Learning
KW - Mortality Prediction
KW - Sepsis
UR - https://www.scopus.com/pages/publications/85184421869
U2 - 10.3346/jkms.2024.39.e53
DO - 10.3346/jkms.2024.39.e53
M3 - Article
C2 - 38317451
AN - SCOPUS:85184421869
SN - 1011-8934
VL - 39
JO - Journal of Korean Medical Science
JF - Journal of Korean Medical Science
IS - 5
M1 - e53
ER -