Application of Oversampling Techniques for Enhanced Transverse Dispersion Coefficient Estimation Performance Using Machine Learning Regression

Sunmi Lee, Inhwan Park

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

The advection–dispersion equation has been widely used to analyze the intermediate field mixing of pollutants in natural streams. The dispersion coefficient, manipulating the dispersion term of the advection–dispersion equation, is a crucial parameter in predicting the transport distance and contaminated area in the water body. In this study, the transverse dispersion coefficient was estimated using machine learning regression methods applied to oversampled datasets. Previous research datasets used for this estimation were biased toward width-to-depth ratio ((Formula presented.)) values ≤ 50, potentially leading to inaccuracies in estimating the transverse dispersion coefficient for datasets with (Formula presented.) > 50. To address this issue, four oversampling techniques were employed to augment the dataset with (Formula presented.) > 50, thereby mitigating the dataset’s imbalance. The estimation results obtained from data resampling with nonlinear regression method demonstrated improved prediction accuracy compared to the pre-oversampling results. Notably, the combination of adaptive synthetic sampling (ADASYN) and eXtreme Gradient Boosting regression (XGBoost) exhibited improved accuracy compared to other combinations of oversampling techniques and nonlinear regression methods. Through the combined ADASYN–XGBoost approach, it is possible to enhance the transverse dispersion coefficient estimation performance using only two variables, (Formula presented.) and bed friction effects ((Formula presented.)), without adding channel sinuosity; this represents the effects of secondary currents.

Original languageEnglish
Article number1359
JournalWater (Switzerland)
Volume16
Issue number10
DOIs
StatePublished - May 2024

Keywords

  • data oversampling
  • imbalanced dataset
  • machine learning
  • nonlinear regression
  • transverse dispersion coefficient

Fingerprint

Dive into the research topics of 'Application of Oversampling Techniques for Enhanced Transverse Dispersion Coefficient Estimation Performance Using Machine Learning Regression'. Together they form a unique fingerprint.

Cite this