DICES Dataset: Diversity in Conversational AI Evaluation for Safety

Lora Aroyo; Alex Taylor; Mark Díaz; Chris Homan; Alicia Parrish; Greg Serapio-García; Vinodkumar Prabhakaran; Ding Wang

DICES Dataset: Diversity in Conversational AI Evaluation for Safety

Lora Aroyo

Alex Taylor

Mark Díaz

Chris Homan

Alicia Parrish

Greg Serapio-García

Vinodkumar Prabhakaran

Ding Wang

NeurIPS2023 (2023)

Download Google Scholar

Abstract

Machine learning approaches often require training and evaluation datasets with a
clear separation between positive and negative examples. This risks simplifying
and even obscuring the inherent subjectivity present in many tasks. Preserving such
variance in content and diversity in datasets is often expensive and laborious. This
is especially troubling when building safety datasets for conversational AI systems,
as safety is both socially and culturally situated. To demonstrate this crucial
aspect of conversational AI safety, and to facilitate in-depth model performance
analyses, we introduce the DICES (Diversity In Conversational AI Evaluation for
Safety) dataset that contains fine-grained demographic information about raters,
high replication of ratings per item to ensure statistical power for analyses, and
encodes rater votes as distributions across different demographics to allow for indepth explorations of different aggregation strategies. In short, the DICES dataset
enables the observation and measurement of variance, ambiguity, and diversity in
the context of conversational AI safety. We also illustrate how the dataset offers
a basis for establishing metrics to show how raters’ ratings can intersects with
demographic categories such as racial/ethnic groups, age groups, and genders. The
goal of DICES is to be used as a shared resource and benchmark that respects
diverse perspectives during safety evaluation of conversational AI systems.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

DICES Dataset: Diversity in Conversational AI Evaluation for Safety

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs