Creating ML benchmarks for climate problems

Overview

Google is looking to fund and actively engage with the research community, exploring opportunities and pathways to leverage Google’s expertise in machine learning (ML) and artificial intelligence (AI) to accelerate the development of data-driven solutions that inform climate action. ML and AI models depend on strong benchmarks that are used to evaluate their performance and track their improvement over time. A recent example is Google’s success in building a flood prediction tool for ungauged basins that uses weather forecasts and basin attributes to predict streamflow, with observed streamflow from thousands of gauges used as a benchmark.

We aim to support the development of ML benchmarks for the areas of interest listed below and the development of modeling tools that would facilitate the use of such benchmarks. In doing so, Google seeks to revolutionize our ability to track and mitigate climate change. We envision that funded projects will involve deep interactions between domain experts in climate and ML, both within Google and at funded institutions. In order to be considered responsive to this call, proposers must make a clear case for why ML and improved benchmarks are poised to be effective and impactful in addressing a given challenge.

Application status

Applications are currently closed.

Decisions for the June 2024 application will be announced via email by October 2024. Please check back in Summer 2025 for details on future application cycles.

Applications open June 27, 2024
Applications close July 17, 2024
Notification of proposal decisions Oct. 1, 2024

Research topics

Proposals should specifically cover one or more of the following topics:

Greenhouse gas (GHG) monitoring: Sustained, near-real-time tracking of GHG emissions and sequestration, leveraging both "top-down" (atmospheric observations) and "bottom-up" (land surface, activity-based) approaches.

Biomass estimation: Accurate above- and below-ground biomass assessments to quantify carbon storage in forests and other ecosystems.

Crop yield prediction and leakage estimation: ML-driven models for anticipating crop yields, identifying and quantifying potential leakage from agricultural practices (e.g., land-use change, soil carbon loss) and optimizing carbon sequestration and storage strategies.

Climate model scoring: Enhanced benchmarks to evaluate and refine climate models, increasing their accuracy and their ability to inform decision-making.

Good benchmarks allow AI researchers to make significant contributions to domains in which they may have little expertise. Consider WeatherBench, which catalyzed recent breakthroughs in AI for global weather prediction. We believe a number of factors contributed to WeatherBench’s success and may be useful design principles for ML benchmarks for climate problems:

Includes an associated, self-contained dataset (i.e., a coarsened version of the ERA5 reanalysis) for evaluating and building models.
Includes a set of top-line metrics (i.e., root mean squared error for 850 mbar temperature and 500 mbar height) that are accepted as meaningful by the scientific community and interpretable by non-experts.
Includes multiple baselines and points of comparisons for measuring progress, including clear thresholds to indicate runaway success (i.e., improving upon the best physics-based weather forecasts).
Includes example code for data preparation and evaluation.
Addresses an important and unsolved problem.

We aim to support projects that address these critical areas by:

Developing benchmarks and datasets: Creating standardized ML benchmarks, curating data libraries, and potentially fueling Kaggle competitions to ignite innovation.

Collecting transformative data: Facilitating the acquisition of key datasets linked to GHG flux and concentration tracking, above/below-ground carbon, wildfire emissions, and other factors observable from remote sensing.

Advancing ML models for climate MRV: Developing models that leverage new or existing data sources (satellite, IoT, etc.) to improve carbon accounting accuracy and reduce uncertainty.

Unlocking new insights from existing instruments: Extracting novel GHG-relevant information from both spaceborne and in-situ instruments.

We believe ML benchmarks hold the key to accelerating the development of nature-based solutions, supporting carbon markets, and informing climate adaptation strategies. Let's collaborate to build a more sustainable future, powered by data and innovation!

Award details

Award amounts vary by topic up to $300K USD, and are intended to support the advancement of the professor’s research during the academic year in which the award is provided. Additional cloud credits can be requested.

Funds will be disbursed to the institution under an agreement stating the funding cannot be used for overhead or indirect costs. The agreement will also require that research results, including any code, models, or other intellectual property, be made publicly available under open source licenses, unless otherwise agreed upon in a separate contract between Google and the recipient.

Requirements

Eligibility

Open to professors (assistant, associate, etc.) at a university or degree-granting research institution.
- This research area is open to PI-eligible faculty and scientists at universities or research institutions. Institutions must be able to abide by the guidance that funds are not intended for overhead or indirect costs.

Applicants may only serve as Principal Investigator (PI) or co-PI on one proposal per round. There can be a maximum of 2 PIs per proposal.

Proposals must be related to computing or technology.

In addition to the guidance provided in our FAQ section, proposals should consider the following:

In order to be considered responsive, the first paragraph of each proposal must address the following questions:
- Why is this problem a good candidate for machine learning?
- Why would ML scale impact better than other approaches?

For proposals that use benchmarks targeting remote sensing ML tasks, structuring the dataset as a geospatial label (e.g., geospatial vector data with timestamps or rasterized variables, similar to WeatherBench) would be required.

Data labels can either be novel or curated from existing datasets, but must be packaged in a form approachable by non-domain experts (e.g., the broader ML community) with self-contained and reproducible examples.

Review criteria

Faculty merit: Faculty is accomplished in research, community engagement, and open source contributions, with potential to contribute to responsible innovation.

Research merit: Faculty's proposed research is aligned with Google Research interests, innovative, and likely to have a significant impact on the field.

Proposal quality: The research proposal is clear, focused, and well-organized, and it demonstrates the team's ability to successfully execute the research and achieve a significant impact.

AI ethics principles: The research proposal strongly aligns with Google's AI Principles.

For research area topics that require the use of a specific product, methodology, or other constraint, we will evaluate your project based on how well it adheres to and utilizes these aforementioned factors, as well as the overall quality of your approach.

FAQs

Listed on GARA landing page.

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events