Coincidence, Categorization, and Consolidation: Learning to Recognize Sounds with Minimal Supervision

Aren Jansen

Dan Ellis

Shawn Hershey

R. Channing Moore

Manoj Plakal

Ashok Popat

Rif A. Saurous

Proceedings of ICASSP 2020 (2020) (to appear)

Download Google Scholar

Abstract

Humans do not acquire perceptual abilities like we train machines. While machine learning algorithms typically operate on large collections of randomly-chosen, explicitly-labeled examples, human acquisition relies far greater on multimodal unsupervised learning (as infants) and active learning (as children). With this motivation, we present a learning framework for sound representation and recognition that combines (i) a self-supervised objective based on a general notion of unimodal and cross-modal coincidence, (ii) a novel clustering objective that reflects our need to impose categorical structure on our experiences, and (iii) a cluster-based active learning procedure that solicits targeted weak supervision to consolidate hypothesized categories into relevant semantic classes. By jointly training a single sound embedding/clustering/classification network according to these criteria, we achieve a new state-of-the-art unsupervised audio representation and demonstrate up to 20-fold reduction in labels required to reach a desired classification performance.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Coincidence, Categorization, and Consolidation: Learning to Recognize Sounds with Minimal Supervision

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Coincidence, Categorization, and Consolidation: Learning to Recognize Sounds with Minimal Supervision

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities