Interpreting Social Respect: A Normative Lens for ML Models

Ben Hutchinson

KJ Pittl

M. Mitchell

(2019)

Download Google Scholar

Abstract

Machine learning is often viewed as an inherently value-neutral process: statistical tendencies in the training inputs are ``simply'' used to generalize to new examples. However when models impact social systems such as interactions between humans, these patterns learned by models have normative implications. It is important that we ask not only ``what patterns exist in the data?'', but also ``how do we want our system to impact people?'' In particular, because minority and marginalized members of society are often statistically underrepresented in data sets, models may have undesirable disparate impact on such groups. As such, objectives of social equity and distributive justice require that we develop tools for both identifying and interpreting harms introduced by models. This paper directly addresses the challenge of interpreting how human values are implicitly encoded by deep neural networks, a machine learning paradigm often seen as inscrutable. Doing so requires understanding how the node activations of neural networks relate to value-laden human concepts such as {\sc respectful} and {\sc abusive}, as well as to concepts about human social identities such as {\sc gay}, {\sc straight}, {\sc male}, {\sc female}, etc. To do this, we present the first application of Testing with Concept Activation Vectors ({\sc tcav}; \cite{kim2018interpretability}) to models for analyzing human language.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Interpreting Social Respect: A Normative Lens for ML Models

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Interpreting Social Respect: A Normative Lens for ML Models

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities