
John Hershey
I am a researcher in Google AI Perception in Cambridge, Massachusetts where I lead a research team in the area of speech and audio machine perception. Prior to Google I spent seven years leading the speech and audio research team at MERL (Mitsubishi Electric Research Labs), and five years at IBM's T. J. Watson Research Center in New York, where I led a team of researchers in noise-robust speech recognition. I also spent a year as a visiting researcher in the speech group at Microsoft Research in 2004, after obtaining my Ph D from UCSD. Over the years I have contributed to more than 100 publications and over 30 patents in the areas of machine perception, speech and audio processing, audio-visual machine perception, speech recognition, and natural language understanding.
Research Areas
Authored Publications
Sort By
Google
TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition
Xuankai Chang
Zalán Borsos
Marco Tagliasacchi
Neil Zeghidour
Interspeech 2023
Don’t Listen to What You Can’t See: The Importance of Negative Examples for Audio-Visual On-Screen Sound Separation
ECCV 2022 Workshop on AV4D: Visual Learning of Sounds in Spaces
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
Tal Remez
European Conference on Computer Vision (ECCV) (2022)
Sparse, Efficient, and Semantic MixIT: Taming In-the-Wild Unsupervised Sound Separation
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2021)
Self-Supervised Learning from Automatically Separated Sound Scenes
Marco Tagliasacchi
Xavier Serra
WASPAA 2021 (2021)
What's All the FUSS About Free Universal Sound Separation Data?
Romain Serizel
Nicolas Turpault
Eduardo Fonseca
Justin Salamon
Prem Seetharaman
ICASSP 2021
DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement
Lion Jones
Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA) (2021)
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Tal Remez
International Conference on Learning Representations (ICLR) 2021