TY - JOUR
T1 - Data-driven automatic classification model for construction accident cases using natural language processing with hyperparameter tuning
AU - Kumi, Louis
AU - Jeong, Jaewook
AU - Jeong, Jaemin
N1 - Publisher Copyright:
© 2023
PY - 2024/8
Y1 - 2024/8
N2 - The construction industry, while vital to societal progress, is marred by a high incidence of accidents and injuries. Manual classification of accident cases is intensive and susceptible to human bias. This study addresses this challenge by developing an automated accident case classification system for the construction industry using Natural Language Processing and machine learning techniques. This study was conducted using the following steps: (1) Establishment of dataset (2) Korean Natural Language Processing (3) Selection of machine learning models (4) Model evaluation. The models exhibited competitive performance, demonstrating high accuracy, precision, and recall rates across all classification tasks. XGBoost outperformed NB, SVM, and KNN for accident type, facility type, and work type with accuracy of 0.80, 0.56, and 0.67, respectively. The results also provided insights into the factors influencing accident classification. This study contributes to construction safety by providing a data-driven foundation for safety decision-making, resource allocation, and benchmarking.
AB - The construction industry, while vital to societal progress, is marred by a high incidence of accidents and injuries. Manual classification of accident cases is intensive and susceptible to human bias. This study addresses this challenge by developing an automated accident case classification system for the construction industry using Natural Language Processing and machine learning techniques. This study was conducted using the following steps: (1) Establishment of dataset (2) Korean Natural Language Processing (3) Selection of machine learning models (4) Model evaluation. The models exhibited competitive performance, demonstrating high accuracy, precision, and recall rates across all classification tasks. XGBoost outperformed NB, SVM, and KNN for accident type, facility type, and work type with accuracy of 0.80, 0.56, and 0.67, respectively. The results also provided insights into the factors influencing accident classification. This study contributes to construction safety by providing a data-driven foundation for safety decision-making, resource allocation, and benchmarking.
KW - Accident classification
KW - Accident type
KW - Facility type
KW - Korean NLP
KW - Machine learning
KW - Work type
UR - https://www.scopus.com/pages/publications/85192449005
U2 - 10.1016/j.autcon.2024.105458
DO - 10.1016/j.autcon.2024.105458
M3 - Article
AN - SCOPUS:85192449005
SN - 0926-5805
VL - 164
JO - Automation in Construction
JF - Automation in Construction
M1 - 105458
ER -