Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 553 publications
Speech Processing
Multimodal Modeling for Spoken Language Identification
Shikhar Bharadwaj
Sriram (Sri) Ganapathy
Sid Dalmia
Wei Han
Yu Zhang
Proceedings of 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024) (2024)
StreamVC: Real-Time Low-Latency Voice Conversion
Jiuqiang Tang
Xing Li
ICASSP 2024 (2024)
USM-SCD: USM-Based Multilingual Speaker Change Detection
Yongqiang Wang
Jason Pelecanos
Yu Zhang
Yiling Huang
Han Lu
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 11801-11805
Now You See Me, Now You Don't: 'Poverty of the Stimulus' Problems and Arbitrary Correspondences in End-to-End Speech Models
Proceedings of the Second Workshop on Computation and Written Language (CAWL) 2024
Automatic Speech Recognition of Conversational Speech in Individuals with Disordered Speech
Bob MacDonald
Rus Heywood
Richard Cave
Katie Seaver
Antoine Desjardins
Jordan Green
Journal of Speech, Language, and Hearing Research (2024) (to appear)
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
Alon Levkovitch
Roy Hirsch
Chulayuth Asawaroengchai
Ehud Rivlin
ICLR (2024)
Translatotron 3: Speech to Speech Translation with Monolingual Data
Alon Levkovitch
Yifan Ding
Chulayuth Asawaroengchai
2024
Helpful Neighbors: Leveraging Neighbors in Geographic Feature Pronunciation
Lion Jones
Richard William Sproat
Haruko Ishikawa
Transactions of the Association for Computational Linguistics, 11 (2023), 85–101
LibriTTS-R: Restoration of a Large-Scale Multi-Speaker TTS Corpus
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
Interspeech 2023 (2023)
Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss
Han Lu
Yiling Huang
ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech Representation and Linguistic Features
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
WASPAA 2023 (2023) (to appear)