
John Hershey
I am a researcher in Google AI Perception in Cambridge, Massachusetts where I lead a research team in the area of speech and audio machine perception. Prior to Google I spent seven years leading the speech and audio research team at MERL (Mitsubishi Electric Research Labs), and five years at IBM's T. J. Watson Research Center in New York, where I led a team of researchers in noise-robust speech recognition. I also spent a year as a visiting researcher in the speech group at Microsoft Research in 2004, after obtaining my Ph D from UCSD. Over the years I have contributed to more than 100 publications and over 30 patents in the areas of machine perception, speech and audio processing, audio-visual machine perception, speech recognition, and natural language understanding.
Research Areas
Authored Publications
Sort By
Google
TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition
Xuankai Chang
Zalán Borsos
Marco Tagliasacchi
Neil Zeghidour
Interspeech 2023
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
Tal Remez
European Conference on Computer Vision (ECCV) (2022)
Don’t Listen to What You Can’t See: The Importance of Negative Examples for Audio-Visual On-Screen Sound Separation
ECCV 2022 Workshop on AV4D: Visual Learning of Sounds in Spaces
Sparse, Efficient, and Semantic MixIT: Taming In-the-Wild Unsupervised Sound Separation
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2021)
What's All the FUSS About Free Universal Sound Separation Data?
Romain Serizel
Nicolas Turpault
Eduardo Fonseca
Justin Salamon
Prem Seetharaman
ICASSP 2021
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Tal Remez
International Conference on Learning Representations (ICLR) 2021
Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement
Zhong-Qiu Wang
Desh Raj
Shinji Watanabe
Zhuo Chen
IEEE SLT 2021
Self-Supervised Learning from Automatically Separated Sound Scenes
Marco Tagliasacchi
Xavier Serra
WASPAA 2021 (2021)