Catalyzing scientific impact through global partnerships and open resources
May 1, 2026
The Google Research Science team
Our approach to open science is built on principles of responsible, inclusive, and rigorous research, empowering a global community to drive high-impact discoveries across disciplines and accelerate progress for all.
Quick links
A scientific breakthrough reaches its full potential only when it empowers others to replicate and expand upon findings, pushing the boundaries of science even further. At Google Research, we recognize that open-source software and open-access datasets are drivers of modern science. We believe that creating these resources responsibly and maintaining them through partnerships with the global scientific community embodies the spirit of collaboration. In this way, we uphold the principles of open science, ensuring that innovation is not a siloed event but a catalyst for worldwide progress.
Whether it’s the Transformer architecture that reshaped automated language processing, or our specialized models transforming medicine, genomics, neuroscience, climate, energy, and a host of other efforts across the physical, life, and social sciences, we are proud of the work we’ve shared and how it’s being used by researchers around the globe to unlock their own groundbreaking discoveries. This open approach complements our breadth of initiatives across Google to engage and strengthen the research and science ecosystem, including through APIs, publications, conferences, trusted tester programs and private partnerships.
Partnerships and ecosystem collaboration
We collaborate with numerous specialized organizations across scientific disciplines and global regions, such as the University of California Santa Cruz (UCSC) Genomics Institute, Janelia Research Campus, Institute of Science & Technology Austria (ISTA), the Centre for Population Genomics, CSIRO - Australia’s national science agency, and the All India Institute of Medical Sciences (AIIMS).
Beyond individual organizations, we actively support widespread scientific consortia undertaking monumental, global challenges, including the Human Pangenome Research Consortium, the Earth BioGenome Project and the NIH BRAIN Initiative.
Ultimately, our open-science philosophy extends to the broader ecosystem and we are investing in building communities of practice for individual scientific developers, starting in India, Korea, Japan and Australia.
Our open-source tools and data
Over the last decade, we have developed, released, maintained and evolved several key open-source technologies and open access datasets. To date these have empowered an active ecosystem of more than 250,000 researchers and developers worldwide.
- Genomics: Our suite of deep learning tools, including DeepVariant, DeepConsensus and DeepPolisher, improve DNA analysis from raw sequencing to final assemblies. These methods have collectively enabled the global community to process the exomes and whole genomes of 2.5 million individuals.
- Neuroscience: Our methods and tools for automated reconstruction, analysis, and visualization of connectomic data include flood-filling networks, Neuroglancer, and TensorStore. These technologies allow scientists to seamlessly segment, navigate, and analyze petascale, high-resolution brain tissue reconstructions. This includes two key publicly available datasets: H01, a 1.4 petabyte sample of human brain tissue accessed over 200k times, and MICrONS, the largest wiring diagram and functional map of the mouse visual cortex.
- Earth & Atmospheric Modeling: We have released Open Buildings, which contains 1.8 billion building detections, across an inference area of 58M km2 covering Africa, South Asia, South-East Asia, Latin America and the Caribbean; Caravan, a community-driven dataset for large-sample hydrology, as part of our flood forecasting effort which now provides prediction in 150 countries covering 2B people for the most significant floods, and the Groundsource dataset for urban flash floods, comprising of 2.6 million historical flood events derived from Gemini on 20 years of public data spanning more than 150 countries; and NeuralGCM, a fully differentiable hybrid atmospheric model. These are also part of our geospatial efforts within Google Earth AI. We have also released FireBench, a high-resolution, synthetic dataset designed to advance wildfire research and a dataset of ionosphere conditions measured using phones, along with a paired visualization of the dataset over time.
- Biodiversity: SpeciesNet is a global-scale model that classifies 2,498 animal categories, including mammals, birds, and reptiles in wildlife camera images.
- Healthcare: Our Health AI Developer Foundations (HAI-DEF) provides a suite of open-weight foundation models — including MedGemma — specialized for multimodal medical text, clinical reasoning, and imaging comprehension. It has more than 4.8M downloads to date. Open Health Stack (OHS) is a suite of open-source tools that make it faster and easier for developers to build secure, offline-capable next-generation digital health solutions based on modern digital healthcare standards. Healthcare applications powered by OHS have been deployed in more than 10 countries with over 65 million beneficiaries.
An image from the human brain fragment reconstruction in which a single neuron (white) receives signals that determine whether or not the neuron fires. This image shows all of the axons that can tell it to fire (green) and all of those that can tell it not to (blue). Credit: Google Research & Lichtman Lab (Harvard University). Renderings by D. Berger (Harvard University)
Real-world impact powered by open science
The true measure of our open-science philosophy is the real-world impact achieved by our partners and end users. Below are some examples detailing how our open tools and datasets have enabled further breakthroughs and been used to help communities across the globe.
Enabling global science
- In partnership with the UCSC’s Genomics Institute, we have developed methods to improve pangenome references, and reduce errors when identifying genetic variants by 50%. This work contributes to the Human Pangenome Research Consortium and their effort to better represent human diversity in genomics references and workflows.
- The Human-Centered Weather Forecasts Initiative at the University of Chicago used NeuralGCM and the European Centre for Medium-Range Weather Forecasts (ECMWF) systems to predict the onset of the Indian monsoon up to a month in advance, even capturing an unusual dry spell in the progression of the monsoon. In partnership with the Indian Ministry of Agriculture and Farmers' Welfare, these advance forecasts were successfully delivered via SMS to 38 million farmers in India, empowering them to optimize their crop planting decisions.
- Global organizations, including the UN Refugee Agency (UNHCR) have optimized disaster response survey sampling for displaced populations using the Open Buildings dataset. This dataset has also enabled additional scientific research, including assessing building risk from sea level rise in the Global South.
African nonprofit Sunbird AI uses Google’s Open Buildings dataset to better understand the energy needs of communities in urban and rural areas.
Enabling health advances
- Researchers at Johns Hopkins University leveraged the H01 human brain reconstruction dataset to identify a new form of neuronal communication, a discovery which suggests that the current understanding of the brain’s organization may be incomplete, overlooking a hidden layer of connectivity, with implications for conditions like Alzheimer's.
- We partnered with Stanford University School of Medicine and UCSC, to adapt genome analyses to find the cause of genetic disease in the most time-critical cases. The program enabled life-saving interventions and set a new Guinness World Record for achieving genetic diagnosis by whole genome sequencing in less than 8 hours.
- In partnership with UCSC and the National Cancer Institute at the NIH, we co-created a publicly available set of cancer genome sequences for method development and evaluation. We also collaboratively developed DeepSomatic to more accurately find cancer variants, which Children’s Mercy Hospital deployed to discover previously missed variants in cancer cases.
- HAI-DEF has driven widespread global engagement and tangible clinical impact by providing open-weight models that democratize medical AI development, especially in low- and middle-income countries. For instance, Zambia-based Dawa Health used MedSigLP to build an AI-powered multilingual cervical cancer education and screening tool that allows midwives to upload colposcopy images via WhatsApp to identify abnormalities in real time.
- Open Health Stack has enabled developers globally to address healthcare gaps, particularly in low resource settings. For example, Ona builds apps that allow health workers to switch from paper-based records to digital solutions. OHS accelerated Ona’s app development and allowed them to adopt interoperable data standards, which healthcare workers then used to deliver better care in underserved communities.
- In New Delhi, AIIMS is using MedGemma to develop applications for outpatient triage and dermatology screening. In Malaysia, MedGemma powers Ask CPG, a conversational interface to the country’s 150+ clinical practice guidelines that the Ministry of Health in Malaysia said has eased navigating the country’s clinical practice guidelines for day-to-day decision support. MedGemma is also empowering individual developers worldwide to build applications for clinical triage, medical document understanding, and diagnostic decision support.
AIIMS is using MedGemma to develop applications for outpatient triage and dermatology screening.
Enabling biodiversity and conservation
- Since 2010, the Snapshot Serengeti camera trapping program has captured over 11 million wildlife images from the African savanna. Using SpeciesNet, researchers at Wake Forest University can now analyze this large dataset in just days, and by running the model from a laptop, they can use the latest wildlife sightings to redeploy cameras in real time to collect targeted data.
- Researchers at the University of Otago are working to preserve the critically endangered kākāpō, a flightless bird of significant cultural importance. Working independently of Google, the researchers re-trained DeepVariant to optimize it for the kākāpō population. This model enabled them to create a genetic map of every living kākāpō to inform breeding strategies and care plans for sick birds, helping to expand the population from a low of 51 to 252 birds.
- Researchers at CSIRO are working with Google to support repopulation efforts for endangered Australian and Tasmanian giant kelp populations. By using Google Earth models and satellite imagery to identify surviving patches, and Google’s open genomics tools to create reference genomes, researchers are linking genetic variants to heat tolerance data. This allows researchers to selectively breed kelp strains that are resilient to rising ocean temperatures.
- The Vertebrate Genomes Project and the Earth BioGenome Project are using our open source genomics tools to make progress toward their monumental goal to sequence the genome of every non-bacterial species on Earth. Bolstered with funding awarded by Google.org to The Rockefeller University, researchers have made full genomes available for 13 iconic endangered species, with an additional 150 species underway.
Images of the elephant, zebra and secretary bird were captured by the Snapshot Serengeti program in Tanzania’s Serengeti National Park. Credit: Snapshot Serengeti / T.M. Anderson. The image of the ocelot was captured in Colombia by Project Lucitania at the Universidad de los Andes. Credit: Project Lucitania/Universidad de los Andes/Red Otus. The image of the mule deer was captured by the Idaho Department of Fish and Game (IDFG). Credit: IDFG. SpeciesNet can help identify these animals.
Looking ahead
Our partnership with the open science community is an accelerating mission. As we transition deeper into the era of AI-enabled science, we are inspired by the way generative AI is profoundly changing how researchers work and collaborate. We believe that agentic workflows will allow scientists to encode their knowledge into specialized skills and transform their methods into accessible, scalable tools. This shift will empower the global community to rapidly reproduce findings, extend complex methodologies, and share their work globally.
In this fast-paced new paradigm, communication and collaboration are more critical than ever. Open-source software and open datasets serve as the essential foundation for this ecosystem. The breakthroughs we celebrate today are merely the initial blueprints for a world with faster innovation and universal sharing of scientific knowledge.
At Google Research, we will continue to build the tools and infrastructure that support this new era of discovery. We look forward to seeing what the global scientific community achieves next.
Acknowledgments
We give special thanks to our many global research partners and to the wider scientific community of users that builds upon our open models, infrastructure, datasets, and other tools to make discoveries and to pioneer, pilot, and implement innovations that create positive global societal impact.