
John Hershey
I am a researcher in Google AI Perception in Cambridge, Massachusetts where I lead a research team in the area of speech and audio machine perception. Prior to Google I spent seven years leading the speech and audio research team at MERL (Mitsubishi Electric Research Labs), and five years at IBM's T. J. Watson Research Center in New York, where I led a team of researchers in noise-robust speech recognition. I also spent a year as a visiting researcher in the speech group at Microsoft Research in 2004, after obtaining my Ph D from UCSD. Over the years I have contributed to more than 100 publications and over 30 patents in the areas of machine perception, speech and audio processing, audio-visual machine perception, speech recognition, and natural language understanding.
Research Areas
Authored Publications
Sort By
Google
TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition
Xuankai Chang
Zalán Borsos
Marco Tagliasacchi
Neil Zeghidour
Interspeech 2023
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
Tal Remez
European Conference on Computer Vision (ECCV) (2022)
Don’t Listen to What You Can’t See: The Importance of Negative Examples for Audio-Visual On-Screen Sound Separation
ECCV 2022 Workshop on AV4D: Visual Learning of Sounds in Spaces
Sparse, Efficient, and Semantic MixIT: Taming In-the-Wild Unsupervised Sound Separation
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2021)
Self-Supervised Learning from Automatically Separated Sound Scenes
Marco Tagliasacchi
Xavier Serra
WASPAA 2021 (2021)
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Tal Remez
International Conference on Learning Representations (ICLR) 2021
DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement
Lion Jones
Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA) (2021)
Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement
Zhong-Qiu Wang
Desh Raj
Shinji Watanabe
Zhuo Chen
IEEE SLT 2021