Google Earth AI: Unlocking geospatial insights with foundation models and cross-modal reasoning

For years, Google has developed AI models that enhance our understanding of the planet. These models help keep Google products fresh, for example, ensuring Maps is accurate by analyzing satellite images and giving Search users the most up-to-date alerts about weather and natural disasters.

As individual models grow more powerful, we’ve learned that many real-world questions require the combination of insights across domains. Answering complex queries like, "Where is a hurricane likely to make landfall? Which communities are most vulnerable and how should they prepare?" requires reasoning about imagery, population and the environment.

Earlier this year, we introduced Google Earth AI to solve this core challenge. By pairing our family of powerful foundation models with a geospatial reasoning agent, which uses our latest Gemini models, it’s becoming possible to perform complex, real-world reasoning at planetary scale. The models provide detailed understanding of our planet, grounded in real-world data. The agent, in turn, acts as an intelligent orchestrator. It deconstructs a complex question into a multi-step plan; executes the plan by calling on these foundation models, querying vast datastores, and using geospatial tools; and finally fuses the results at each step into a holistic answer.

Today, we're introducing new Earth AI innovations:

New Imagery and Population foundation models, along with technical details and evaluations showing state-of-the-art performance.
Demonstrations of our geospatial reasoning agent using these models to solve complex, multi-step geospatial queries.

To learn more, we invite you to read our full technical paper, "Google Earth AI: Unlocking Geospatial Insights with Foundation Models and Cross-Modal Reasoning". You can also get involved by expressing interest as we expand access to these new capabilities for developers and enterprises.

Earth AI unites state-of-the-art models with geospatial reasoning agents to address critical global challenges.

Building blocks of Earth AI: State-of-the-art foundation models

Imagery

Our new Remote Sensing Foundations models simplify and accelerate satellite imagery analysis using three core capabilities: vision-language models, open-vocabulary object detection, and adaptable vision backbones. Users can ask natural language queries, like "find all flooded roads" in an image captured after a storm, and get rapid, accurate answers. Our models are trained on a large corpus of high-resolution overhead imagery, paired with text descriptions. They achieve state-of-the-art results on multiple public Earth observation benchmarks. For instance, we achieve >16% average improvement on text-based image search tasks, while our zero-shot model for novel object detection more than doubles the baseline accuracy.

Model evaluation shows significant Average Precision (AP50) improvement of our Remote-Sensing optimized RS-OWL-ViT-v2 model (“Ours”) over the OWL-ViT-v2 open vocabulary detection model in a zero-shot setting and illustrates the advantage of the combined FLAME + RS-OWL-ViT-v2 approach ("Ours") over SIoU for few-shot detection on novel classes.

Population

This area of research, which includes Mobility AI and Population Dynamics Foundations, aims to understand the complex interplay between people and places. Our latest research in Population Dynamics Foundations introduces two key innovations: globally-consistent embeddings across 17 countries and monthly updated embeddings that capture the changing dynamics of human activity, which are critical for time-sensitive predictions. Population Dynamics Foundations has shown remarkable effectiveness in independent studies; for example, researchers at the University of Oxford found that incorporating these embeddings into a forecasting model for Dengue fever in Brazil improved long-range R² (a metric that measures how well a model explains the actual disease rates) from 0.456 to 0.656 for 12-month predictions.

Evaluation of our Population Dynamics Foundations across 17 countries; R2 score (range is 0–1, higher is better) by country for predicting population density, tree cover, night time lights, and elevation. The global trend matches the strong performance we originally demonstrated in the US only.

Similarity per dimension of the Population Dynamics Foundations embeddings, visualized by US zip code. The patterns across dimensions capture the diverse characteristics of the US population.

Environment

Our previously-published research demonstrates state-of-the-art forecasts for medium-range weather, monsoon onsets, air quality and riverine floods. We've recently expanded these Environment models to make precipitation nowcasts for the entire planet, and we’re now covering 2 billion people with forecasts for the most significant riverine floods.

Increased predictive power by combining models

While each foundation model provides powerful insights, our findings confirm that combining models yields even more predictive power. This synergistic approach produces a more comprehensive and accurate understanding of real-world phenomena and dramatically improves predictions across critical applications.

For example, FEMA’s National Risk Index shows which communities are most at risk to natural hazards like floods and storms, based on a variety of factors including economic and social vulnerability as well as physical and environmental risk. By fusing embeddings that capture socio-economic features from our Population Dynamics Foundations and landscape features from AlphaEarth Foundations, we improved prediction of FEMA’s National Risk Index by an average of 11% in R² across 20 different hazards, versus using either data source alone, with the most significant gains in predicting risk from tornadoes (+25% R²) and riverine flooding (+17% R²).

Complex problem-solving via Geospatial Reasoning

The example above illustrates that tackling real-world problems requires insights from multiple models with diverse capabilities. Orchestrating these Earth AI insights is simplified by our new Gemini-powered Geospatial Reasoning agent. The agent deconstructs complex, natural language queries and plans a dynamic, multi-step path to an answer. To execute each step, the agent can call on “expert” sub-agents that are equipped with Earth AI models described above, as well as the vast, real-world data found in Data Commons, Earth Engine, and geospatial-specific tools. This modular network of agents allows for extensibility and customization.

To see how it works, consider a user who wishes to identify specific populations that are vulnerable to the risk of an oncoming storm. The agent executes a transparent series of reasoning steps:

Invoke the Environment model to identify the specific geographic areas that are forecast to be at risk of hurricane force winds.
Query Data Commons for demographic statistics to identify higher-population counties in the area of predicted landfall.
Retrieve official boundaries for the counties of interest from BigQuery’s public datasets.
Perform a spatial intersection between the wind zones and official county boundaries.
Identify the most vulnerable postal codes by training a model on the fly using our Population Dynamics Foundations and county level statistics.
Use Remote Sensing Foundations object detection model to identify critical infrastructure in satellite imagery taken over one of the most vulnerable postal codes.

To assist a user in understanding vulnerability to an oncoming storm, our Gemini-powered Geospatial Reasoning agent uses our Environment model to identify the likely path of hurricane force winds, intersects this with country boundaries and population density from Big Query and Data Commons, and reasons across all of this data to pick the most critical locations. It also trains a model on the fly to generate higher resolution vulnerability data using Population Dynamics Foundations. And identifies critical infrastructure in satellite imagery using Remote Sensing Foundations.

To evaluate the agent, we developed two new methods for evaluation: a Q&A benchmark for fact-finding and analysis with verifiable ground truth answers based on publicly available data and Crisis Response case studies for complex, predictive scenarios (e.g., solving the entire challenge above).

On the Q&A benchmark, our Geospatial Reasoning Agent achieved an overall accuracy of 0.82, significantly outperforming the baseline Gemini 2.5 Pro (0.50) and Gemini 2.5 Flash (0.39) agents (scores derived from ROUGE-L F1 and percentage error, higher is better). This highlights the importance of giving agents access to specialized geospatial models and tools for these types of queries.

Visualizing the performance of agents on the Q&A benchmark. The Geospatial Reasoning agent outperformed the baseline Gemini 2.5 Pro agent by 37% in the Descriptive and Retrieval category, and 124% in the more complex Analytical and Relational category, for an overall 64% higher score (scores derived from ROUGE-L F1 and percentage error).

In the more complex Crisis Response scenarios, our paper demonstrates the benefit of orchestrating a diverse set of Environment, Remote Sensing and Population Dynamics insights via case studies. Leveraging specialized sub-agents for geospatial and demographic analysis, we’re able to solve real-world analysis tasks.

Unlocking our planet's potential, together

Earth AI represents a fundamental leap in planetary understanding. Our findings show that a multimodal, reasoning-based approach, built upon a foundation of state-of-the-art geospatial AI models, can unlock insights that are intractable with siloed analysis alone.

We are just beginning to explore the full potential of Earth AI and are committed to expanding access in order to help the global community address the planet’s most pressing challenges. For example:

Bellwether, a Google X moonshot, is using our weather forecasts, Population Dynamics Foundations embeddings, satellite image analysis and property databases to predict building damage before a storm strikes, helping their insurance clients pay claims faster so homeowners can start rebuilding sooner — saving them time, money and stress.
United Nations Global Pulse uses Earth AI Imagery models to assess damage after natural disasters, enabling governments and international organizations to rapidly respond to crises.
GiveDirectly is using Geospatial Reasoning with our flood forecasts to identify at-risk communities and send cash aid to help households prepare for and mitigate disaster.

In addition to supporting UN Global Pulse, GiveDirectly, and other organizations using Earth AI, Google.org is providing funds to partners like Khushi Baby, Cooper/Smith, Direct Relief and Froncort.ai who are utilizing Population Dynamics Foundations to model infectious diseases and improve public health action globally. New enterprise users of Earth AI include Public Storage, CARTO and Visiona Space Technology (part of Embraer).

We want to hear how Earth AI might be helpful to you. We encourage organizations to express interest in getting early access to Remote Sensing Foundations (available as Imagery models in Vertex AI), Population Dynamics Foundations, and Geospatial Reasoning.

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Google Earth AI: Unlocking geospatial insights with foundation models and cross-modal reasoning

Quick links