We make tools and datasets available to the broader research community with the goal of building a more collaborative ecosystem.
Some of our datasets
Attributed Question Answering (QA) as a key first step in the development of attributed LLMs. This release consists of human-rated system outputs for Attributed Question Answering.
VRDU contains two datasets that represent several challenges: rich schema including diverse data types, complex templates, and diversity of layouts within a single document type.
MD3 features audio and transcripts of thousands of conversational dialogues in English from India, Nigeria, and the US. In each dialogue, speakers are prompted with an information-sharing intent (image or phrase).
The output of generative large language models, annotated to indicate whether it only conveys information that is verifiable in provided source documents.
DiffQG is a dataset about summarizing the difference between two passages using a question and answer pair.
A dataset with 199 failures that 107 users have encountered when interacting with commercial voice assistants.
The MusiCaps dataset contains 5.5k high-quality music captions written by musicians. Each describes a 10s clip of music from YouTube.
C4RepSet, representative subset of C4 (Colossal Clean Crawled Corpus), offers efficient training of large language models despite being significantly smaller than C4.
Tools and services
Use TensorFlow tools to process and load your data
Use pre-trained models or create custom ones.
The TPU Research Cloud (TRC) program enables researchers to apply for access to a cluster of more than 1,000 Cloud TPUs at no charge to help them accelerate the next wave of research breakthroughs.
Dataset Search enables users to find datasets stored in thousands of repositories across the web, making these datasets universally accessible and useful for everyone.
Train and run machine learning models faster than ever before.
Google Cloud's AI provides modern machine learning services, with pre-trained models and a service to generate your own tailored models.
Train high quality custom machine learning models with minimum effort and machine learning expertise.
Colaboratory is a Google research project created to help disseminate machine learning education and research. It's a Jupyter notebook environment that requires no setup to use and runs entirely in the cloud.
Open source
A toolkit of activities, frameworks, and guidance for transparency in research dataset documentation. Customizable, participatory methods to create Data Cards templates.
Our open-source machine learning platform for everyone.
Google believes that open source is good for everyone.
Explore all the open source releases from Google Research.
Build your machine learning skills
Whether you’re an ML expert or you’re just getting started, you’ll find training and information in our resource center.