Dépôt numérique

A novel explainable PSO-XGBoost model for regional flood frequency analysis at a national scale: Exploring spatial heterogeneity in flood drivers.

Kanani-Sadat, Yousef ORCID logoORCID: https://orcid.org/0000-0002-9215-5354; Safari, Abdolreza; Nasseri, Mohsen ORCID logoORCID: https://orcid.org/0000-0002-7584-7631 et Homayouni, Saeid ORCID logoORCID: https://orcid.org/0000-0002-0214-5356 (2024). A novel explainable PSO-XGBoost model for regional flood frequency analysis at a national scale: Exploring spatial heterogeneity in flood drivers. Journal of Hydrology , vol. 638 . p. 131493. DOI: 10.1016/j.jhydrol.2024.131493.

Ce document n'est pas hébergé sur EspaceINRS.


Identifying flood drivers and accurately estimating design floods play a crucial role in fostering sustainable and effective planning and management strategies for mitigating flood risks. Regional Flood frequency Analysis (RFFA) is one of the most commonly used approaches to estimate design floods in ungauged watersheds. This study used XGBoost coupled with Particle Swarm Optimization (PSO) to estimate different quantiles of the design floods with return periods (from 2-year to 1000-year). After a preliminary assessment, 373 nationwide hydrometric stations were selected to conduct at-site flood frequency analysis by identifying the best-fitting distribution. Using the capabilities of GIS and Google Earth Engine (GEE), 83 independent features including different physiographical, geomorphological, land-use, soil types, and long-term hydro-climatic and environmental variables were extracted for the upstream watersheds. After fine-tuning the hyper-parameters of the XGBoost method for each flood quantile, the feature importance values were used to eliminate the insignificant features and refine the developed models. Additionally, classical methods such as Support Vector Regression (SVR) and Random Forest (RF) were implemented, to evaluate the XGBoost models efficiency. Different statistics demonstrated that the models effectively estimated flood quantiles, with the Nash-Sutcliffe Efficiency (NSE) varying from 0.709 to 0.840 across all models. A comparison of model performance reveals that the XGBoost method outperformed RF and SVR across all flood quantiles. Based on the developed models, design floods have been estimated for 949 stations across Iran. Furthermore, the Shapley additive explanation (SHAP) values were used to identify the main contributing features to model outputs and investigate the spatial heterogeneity of main flood drivers. According to the results, the perimeter and length of the watershed and heavy rainfall exhibit notably high importance compared to other features for all models. Based on the local SHAP values, in Northern, Northwestern, and Western basins, features associated with watershed sizes, such as perimeter, area and length exhibit the highest levels of importance. Moreover, the Southwest basins are more influenced by “heavy rainfall”. These findings demonstrate the promise of the developed models for estimating flood quantiles across diverse environmental, geomorphological, and hydro-climatic conditions. This capability is valuable for sustainable watershed management, especially in environments with limited maximum discharge data.

Type de document: Article
Mots-clés libres: regional flood frequency analysis; google earth engine; XGBoost; SHAP; flood drivers; spatial heterogeneity
Centre: Centre Eau Terre Environnement
Date de dépôt: 09 juill. 2024 14:49
Dernière modification: 09 juill. 2024 14:49
URI: https://espace.inrs.ca/id/eprint/15721

Gestion Actions (Identification requise)

Modifier la notice Modifier la notice