We make tools and datasets available to the broader research community with the goal of building a more collaborative ecosystem.
Some of our datasets
Attributed Question Answering (QA) as a key first step in the development of attributed LLMs. This release consists of human-rated system outputs for Attributed Question Answering.
VRDU contains two datasets that represent several challenges: rich schema including diverse data types, complex templates, and diversity of layouts within a single document type.
MD3 features audio and transcripts of thousands of conversational dialogues in English from India, Nigeria, and the US. In each dialogue, speakers are prompted with an information-sharing intent (image or phrase).
The output of generative large language models, annotated to indicate whether it only conveys information that is verifiable in provided source documents.
DiffQG is a dataset about summarizing the difference between two passages using a question and answer pair.
A dataset with 199 failures that 107 users have encountered when interacting with commercial voice assistants.
The MusiCaps dataset contains 5.5k high-quality music captions written by musicians. Each describes a 10s clip of music from YouTube.
C4RepSet, representative subset of C4 (Colossal Clean Crawled Corpus), offers efficient training of large language models despite being significantly smaller than C4.
Some of our projects
The Flood Forecasting Initiative uses AI to make flood forecasting information universally accessible.
Med-PaLM is a Large Language Model fine-tuned and designed to provide high quality answers to medical questions.
USM, a family of state-of-the-art speech models with 2B parameters trained on 12M hours (28 billion sentences) of speech, performs automatic speech recognition for 300+ languages.
An Android beta app that uses machine learning to help people with non-standard speech make their voices heard.
SCOUTS is a Research initiative with the mission to provide people and ML systems with the scalable, trustworthy societal context knowledge required to realize responsible and robust AI.
The Wordcraft Writers Workshop explores the limits of co-writing with LaMDA and fosters an honest and earnest conversation about the rapidly changing relationship between technology and creativity.
Realistic video generation of arbitrary length from open-domain textual descriptions.
Tools and services
Use TensorFlow tools to process and load your data
Use pre-trained models or create custom ones.
The TPU Research Cloud (TRC) program enables researchers to apply for access to a cluster of more than 1,000 Cloud TPUs at no charge to help them accelerate the next wave of research breakthroughs.
Dataset Search enables users to find datasets stored in thousands of repositories across the web, making these datasets universally accessible and useful for everyone.
Train and run machine learning models faster than ever before.
Google Cloud's AI provides modern machine learning services, with pre-trained models and a service to generate your own tailored models.
Train high quality custom machine learning models with minimum effort and machine learning expertise.
Colaboratory is a Google research project created to help disseminate machine learning education and research. It's a Jupyter notebook environment that requires no setup to use and runs entirely in the cloud.
A toolkit of activities, frameworks, and guidance for transparency in research dataset documentation. Customizable, participatory methods to create Data Cards templates.
Our open-source machine learning platform for everyone.
Google believes that open source is good for everyone.
Explore all the open source releases from Google Research.
Build your machine learning skills
Whether you’re an ML expert or you’re just getting started, you’ll find training and information in our resource center.