All that Agrees Is Not Gold: Evaluating Ground Truth Labels and Dialogue Content for Safety

Ding Wang; Mark Díaz; Alicia Parrish; Lora Aroyo; Chris Homan; Greg Serapio-García; Vinodkumar Prabhakaran; Alex Taylor

All that Agrees Is Not Gold: Evaluating Ground Truth Labels and Dialogue Content for Safety

Ding Wang

Mark Díaz

Alicia Parrish

Lora Aroyo

Chris Homan

Greg Serapio-García

Vinodkumar Prabhakaran

Alex Taylor

(2023)

Download Google Scholar

Abstract

Dialogue safety as a task is complex, in part because ‘safety’ entails a broad range of topics and concerns, such as toxicity, harm, legal concerns, health advice, etc. Who we ask to judge safety and who we ask to define safety may lead to differing conclusions. This is because definitions and understandings of safety can vary according to one’s identity, public opinion, and the interpretation of existing laws and regulations. In this study, we compare annotations from a diverse set of over 100 crowd raters to gold labels derived from trust and safety (T&S) experts in a dialogue safety task consisting of 350 human-chatbot conversations. We find patterns of disagreements rooted in dialogue structure, dialogue content, and rating rationale. In contrast to typical approaches which treat gold labels as ground truth, we propose alternative ways of interpreting gold data and incorporating crowd disagreement rather than mitigating it. We discuss the complexity of safety annotation as a task, what crowd and T&S labels each uniquely capture, and how to make determinations about when and how to rely on crowd or T&S labels.

Research Areas

Human-computer interaction and visualization

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

All that Agrees Is Not Gold: Evaluating Ground Truth Labels and Dialogue Content for Safety

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs