Using audio-visual information to understand speaker activity: Tracking active speakers on and off screen

Ken Hoover

Sourish Chaudhuri

Caroline Pantofaru

Ian Rutherford Sturdy

Malcolm Slaney

Proceedings of ICASSP, 2018

Download Google Scholar

Abstract

We present a system that associates faces with voices in a video by fusing information from the audio and visual signals. The thesis underlying our work is that an extreme simple approach to generating (weak) speech clusters can be combined with strong visual signals to effectively associate faces and voices by aggregating statistics across a video. This approach does not need any training data specific to this task and leverages the natural coherence of information in the audio and visual streams. It is particularly applicable to tracking speakers in videos on the web where a priori information about the environment (e.g., number of speakers, spatial signals for beamforming) is not available.

Research Areas

Machine Intelligence
Speech Processing

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Using audio-visual information to understand speaker activity: Tracking active speakers on and off screen

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Using audio-visual information to understand speaker activity: Tracking active speakers on and off screen

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities