
Ankur Bapna
I am a Staff Software Engineer on the Brain team. My current research interests include multimodal representation learning for speech and text, massively multilingual modeling and applications of these approaches to translation, ASR, TTS and tasks involving end-to-end speech understanding and generation.
Authored Publications
Sort By
Google
Multimodal Modeling for Spoken Language Identification
Shikhar Bharadwaj
Sriram (Sri) Ganapathy
Vera Axelrod
Sid Dalmia
Wei Han
Yu Zhang
Proceedings of 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024) (2024)
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech Representation and Linguistic Features
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
WASPAA 2023 (2023) (to appear)
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-Speech
Takaaki Saeki
Zhehuai Chen
Nobuyuki Morioka
Yu Zhang
ICASSP (2023)
Label Aware Speech Representation Learning For Language Identification
Shikhar Bharadwaj
Sriram Ganapathy
Vera Axelrod
Wei Han
Proceedings of Interspeech 2023, pp. 5351-5355
LibriTTS-R: Restoration of a Large-Scale Multi-Speaker TTS Corpus
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
Interspeech 2023 (2023)
Building Machine Translation Systems for the Next Thousand Languages
Julia Kreutzer
Aditya Siddhant
Mengmeng Niu
Pallavi Nikhil Baljekar
Xavier Garcia
Vera Saldinger Axelrod
Yuan Cao
Maxim Krikun
Pidong Wang
Apu Shah
Zhifeng Chen
Yonghui Wu
Macduff Richard Hughes
Google Research (2022)
XTREME-S: Evaluating Cross-lingual Speech Representations
Clara E. Rivera
Mihir Sanjay Kale
Sebastian Ruder
Simran Khanuja
Ye Jia
Yu Zhang
Proc. Interspeech 2022
MAESTRO: Matched Speech Text Representations through Modality Matching
Pedro Jose Moreno Mengibar
Yu Zhang
Zhehuai Chen
interspeech 2022 (2022) (to appear)