Don’t Listen to What You Can’t See: The Importance of Negative Examples for Audio-Visual On-Screen Sound Separation

Efthymios Tzinis

Scott Wisdom

John Hershey

ECCV 2022 Workshop on AV4D: Visual Learning of Sounds in Spaces

Download Google Scholar

Abstract

For the task of audio-visual on-screen sound separation, we illustrate the importance of using evaluation sets that includes not only positive examples (videos with on-screen sounds), but also negative examples (videos that only contain off-screen sounds). Given an evaluation set that includes such examples, we provide metrics and a calibration procedure to allow fair comparison of different models with a single metric, which is analogous to calibrating binary classifiers to achieve a desired false alarm rate. In addition, we propose a method of probing on-screen sound separation models by masking objects in input video frames. Using this method, we probe the sensitivity of our recently-proposed AudioScopeV2 model, and discover that its robustness to removing on-screen sound objects is improved by providing supervised examples in training.

Research Areas

Machine Perception

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Don’t Listen to What You Can’t See: The Importance of Negative Examples for Audio-Visual On-Screen Sound Separation

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Don’t Listen to What You Can’t See: The Importance of Negative Examples for Audio-Visual On-Screen Sound Separation

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities