
Ankur Bapna
I am a Staff Software Engineer on the Brain team. My current research interests include multimodal representation learning for speech and text, massively multilingual modeling and applications of these approaches to translation, ASR, TTS and tasks involving end-to-end speech understanding and generation.
Authored Publications
Sort By
Google
Multimodal Modeling for Spoken Language Identification
Shikhar Bharadwaj
Sriram (Sri) Ganapathy
Sid Dalmia
Wei Han
Yu Zhang
Proceedings of 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024) (2024)
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech Representation and Linguistic Features
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
WASPAA 2023 (2023) (to appear)
LibriTTS-R: Restoration of a Large-Scale Multi-Speaker TTS Corpus
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
Interspeech 2023 (2023)
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-Speech
Takaaki Saeki
Zhehuai Chen
Nobuyuki Morioka
Yu Zhang
ICASSP (2023)
Label Aware Speech Representation Learning For Language Identification
Shikhar Bharadwaj
Sriram Ganapathy
Wei Han
Proceedings of Interspeech 2023, pp. 5351-5355
XTREME-S: Evaluating Cross-lingual Speech Representations
Clara E. Rivera
Mihir Sanjay Kale
Sebastian Ruder
Simran Khanuja
Ye Jia
Yu Zhang
Proc. Interspeech 2022
Building Machine Translation Systems for the Next Thousand Languages
Julia Kreutzer
Mengmeng Niu
Pallavi Nikhil Baljekar
Xavier Garcia
Maxim Krikun
Pidong Wang
Apu Shah
Zhifeng Chen
Yonghui Wu
Macduff Richard Hughes
Google Research (2022)
Joint Unsupervised and Supervised Training for Multilingual ASR
Yu Zhang
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2022), pp. 6402-6406