WeatherBench 2: A benchmark for the next generation of data-driven global weather models

Alex Merose
Peter Battaglia
Tyler Russell
Alvaro Sanchez
Vivian Yang
Matthew Chantry
Zied Ben Bouallegue
Peter Dueben
Carla Bromberg
Jared Sisk
Luke Barrington
Aaron Bell
arXiv (2023) (to appear)

Abstract

WeatherBench 2 is an update to the global, medium-range (1-14 day) weather forecasting benchmark proposed by Rasp et al. (2020), designed with the aim to accelerate progress in data-driven weather modeling. WeatherBench 2 consists of an open-source evaluation framework, publicly available training, ground truth and baseline data as well as a continuously updated website with the latest metrics and state-of-the-art models: https://sites.research.google/weatherbench. This paper describes the design principles of the evaluation framework and presents results for current state-of-the-art physical and data-driven weather models. The metrics are based on established practices for evaluating weather forecasts at leading operational weather centers. We define a set of headline scores to provide an overview of model performance. In addition, we also discuss caveats in the current evaluation setup and challenges for the future of data-driven weather forecasting.