Abstract
The advection–dispersion equation has been widely used to analyze the intermediate field mixing of pollutants in natural streams. The dispersion coefficient, manipulating the dispersion term of the advection–dispersion equation, is a crucial parameter in predicting the transport distance and contaminated area in the water body. In this study, the transverse dispersion coefficient was estimated using machine learning regression methods applied to oversampled datasets. Previous research datasets used for this estimation were biased toward width-to-depth ratio ((Formula presented.)) values ≤ 50, potentially leading to inaccuracies in estimating the transverse dispersion coefficient for datasets with (Formula presented.) > 50. To address this issue, four oversampling techniques were employed to augment the dataset with (Formula presented.) > 50, thereby mitigating the dataset’s imbalance. The estimation results obtained from data resampling with nonlinear regression method demonstrated improved prediction accuracy compared to the pre-oversampling results. Notably, the combination of adaptive synthetic sampling (ADASYN) and eXtreme Gradient Boosting regression (XGBoost) exhibited improved accuracy compared to other combinations of oversampling techniques and nonlinear regression methods. Through the combined ADASYN–XGBoost approach, it is possible to enhance the transverse dispersion coefficient estimation performance using only two variables, (Formula presented.) and bed friction effects ((Formula presented.)), without adding channel sinuosity; this represents the effects of secondary currents.
Original language | English |
---|---|
Article number | 1359 |
Journal | Water (Switzerland) |
Volume | 16 |
Issue number | 10 |
DOIs | |
State | Published - May 2024 |
Keywords
- data oversampling
- imbalanced dataset
- machine learning
- nonlinear regression
- transverse dispersion coefficient