Sound Retrieval and Ranking Using Sparse Auditory Representations

Richard F Lyon

Martin Rehn

Samy Bengio

Thomas C. Walters

Gal Chechik

Neural Computation, 22(2010), pp. 2390-2416

Download Google Scholar

Abstract

To create systems that understand the sounds that humans are exposed to in everyday life, we need to represent sounds with features that can discriminate among many different sound classes. Here, we use a sound-ranking framework to quantitatively evaluate such representations in a large scale task. We have adapted a machine-vision method, the ``passive-aggressive model for image retrieval'' (PAMIR), which efficiently learns a linear mapping from a very large sparse feature space to a large query-term space. Using this approach we compare different auditory front ends and different ways of extracting sparse features from high-dimensional auditory images. We tested auditory models that use adaptive pole--zero filter cascade (PZFC) auditory filterbank and sparse-code feature extraction from stabilized auditory images via multiple vector quantizers. In addition to auditory image models, we also compare a family of more conventional Mel-Frequency Cepstral Coefficient (MFCC) front ends. The experimental results show a significant advantage for the auditory models over vector-quantized MFCCs. Ranking thousands of sound files with a query vocabulary of thousands of words, the best precision at top-1 was 73% and the average precision was 35%, reflecting a 18% improvement over the best competing MFCC.

Research Areas

Machine Perception

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Sound Retrieval and Ranking Using Sparse Auditory Representations

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Sound Retrieval and Ranking Using Sparse Auditory Representations

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities