
Scott Wisdom
I am a researcher in Google AI Perception in Cambridge, MA working on speech, audio, and audio-visual machine perception, with a focus on audio source separation.
Research Areas
Authored Publications
Sort By
Google
TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition
Xuankai Chang
Zalán Borsos
Marco Tagliasacchi
Neil Zeghidour
Interspeech 2023
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
Tal Remez
European Conference on Computer Vision (ECCV) (2022)
Don’t Listen to What You Can’t See: The Importance of Negative Examples for Audio-Visual On-Screen Sound Separation
ECCV 2022 Workshop on AV4D: Visual Learning of Sounds in Spaces
Self-Supervised Learning from Automatically Separated Sound Scenes
Marco Tagliasacchi
Xavier Serra
WASPAA 2021 (2021)
DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement
Lion Jones
Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA) (2021)
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Tal Remez
International Conference on Learning Representations (ICLR) 2021
Sparse, Efficient, and Semantic MixIT: Taming In-the-Wild Unsupervised Sound Separation
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2021)