Global Flood Forecasting at a Fine Catchment Resolution using Machine Learning
Abstract
Machine learning has been shown to be a promising tool for hydrological modeling. We have used this technology to develop an operational real-time global streamflow prediction model. The model architecture is based primarily on an LSTM (Long Short Term Memory), which is a form of RNN (Recurrent Neural Network) that includes a state vector similar to dynamical systems models.
Our model has been shown to outperform physical and conceptual hydrologic models across time and spatial scales. The main advantage of this ML approach is that models can be trained (calibrated) over many diverse catchments simultaneously rather than being calibrated separately per catchment. This advantage is especially important when modeling on a global scale where the model is trained on a very large number of catchments that have diverse climatology and geographical settings. Consequently, the model learns different rainfall-runoff dynamics of rivers across these settings and is able to predict accordingly. Once the model is trained (a very short process in comparison to calibrating traditional global models), it can be applied almost anywhere where basin attributes are available, in particular, at ungauged locations.
We use globally available, near-real time datasets for training and inference, which allows running the model operationally.
Global datasets used:
HydroSHEDS database for global catchments delineation and static attributes.
Meteorological forcing data from:
ECMWF weather data, including the ERA5-Land reanalysis and the IFS HRES real-time forecasts and re-forecasts.
NOAA’s IMERG (early) global precipitation estimates.
CPC Global Unified Gauge-Based Analysis of Daily Precipitation.
Stream flow global datasets such as GRDC and Caravan for streamflow discharge labels.
Our model has been shown to outperform physical and conceptual hydrologic models across time and spatial scales. The main advantage of this ML approach is that models can be trained (calibrated) over many diverse catchments simultaneously rather than being calibrated separately per catchment. This advantage is especially important when modeling on a global scale where the model is trained on a very large number of catchments that have diverse climatology and geographical settings. Consequently, the model learns different rainfall-runoff dynamics of rivers across these settings and is able to predict accordingly. Once the model is trained (a very short process in comparison to calibrating traditional global models), it can be applied almost anywhere where basin attributes are available, in particular, at ungauged locations.
We use globally available, near-real time datasets for training and inference, which allows running the model operationally.
Global datasets used:
HydroSHEDS database for global catchments delineation and static attributes.
Meteorological forcing data from:
ECMWF weather data, including the ERA5-Land reanalysis and the IFS HRES real-time forecasts and re-forecasts.
NOAA’s IMERG (early) global precipitation estimates.
CPC Global Unified Gauge-Based Analysis of Daily Precipitation.
Stream flow global datasets such as GRDC and Caravan for streamflow discharge labels.