TY - JOUR
T1 - Precision Forecasting in Colorectal Oncology
T2 - Predicting Six-Month Survival to Optimize Clinical Decisions
AU - Lee, Jaehyuk
AU - Cho, Youngchae
AU - Kyung, Yeunwoong
AU - Kim, Eunchan
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/3
Y1 - 2025/3
N2 - Colorectal cancer (CRC) has a relatively high five-year survival rate compared to other cancers; however, this rate drops significantly in patients with malignant CRC. One critical factor in palliative care decision-making is the ability to accurately predict patient survival, with the six-month survival period commonly used as a threshold. In this study, we evaluated the performance of five machine learning models—logistic regression, decision tree, random forest, multilayer perceptron, and extreme gradient boosting (XGBoost)—in predicting six-month survival for patients with malignant CRC using a publicly available synthetic dataset containing 11,774 samples and 51 features. The models were trained and validated using five-fold cross-validation, and the synthetic minority oversampling technique (SMOTE) was applied to address class imbalance. Among the models, XGBoost demonstrated the highest performance, achieving 95% accuracy, precision, recall, and F1-score, along with 90% specificity. Feature importance analysis identified smoking status and surgical history as key factors influencing model predictions. These findings highlight the potential of tree-based machine learning models in supporting timely and informed palliative care decisions, while also providing insights into handling data imbalance and optimizing model parameters in survival prediction tasks.
AB - Colorectal cancer (CRC) has a relatively high five-year survival rate compared to other cancers; however, this rate drops significantly in patients with malignant CRC. One critical factor in palliative care decision-making is the ability to accurately predict patient survival, with the six-month survival period commonly used as a threshold. In this study, we evaluated the performance of five machine learning models—logistic regression, decision tree, random forest, multilayer perceptron, and extreme gradient boosting (XGBoost)—in predicting six-month survival for patients with malignant CRC using a publicly available synthetic dataset containing 11,774 samples and 51 features. The models were trained and validated using five-fold cross-validation, and the synthetic minority oversampling technique (SMOTE) was applied to address class imbalance. Among the models, XGBoost demonstrated the highest performance, achieving 95% accuracy, precision, recall, and F1-score, along with 90% specificity. Feature importance analysis identified smoking status and surgical history as key factors influencing model predictions. These findings highlight the potential of tree-based machine learning models in supporting timely and informed palliative care decisions, while also providing insights into handling data imbalance and optimizing model parameters in survival prediction tasks.
KW - colorectal cancer survival prediction
KW - machine learning
KW - medical decision support
KW - palliative care
UR - http://www.scopus.com/inward/record.url?scp=86000558510&partnerID=8YFLogxK
U2 - 10.3390/electronics14050880
DO - 10.3390/electronics14050880
M3 - Article
AN - SCOPUS:86000558510
SN - 2079-9292
VL - 14
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 5
M1 - 880
ER -