
Google at ICASSP 2025
Google at ICASSP 2025
Google is proud to be a Diamond Patron of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025), a premier annual conference, which is being held April 6 through April 11, 2025 in Hyderabad, India. Researchers from across Google are actively engaged at the conference with over 20 accepted papers and involvement in several workshops and lectures. We look forward to expanding our partnership with the broader research community and sharing some of our extensive research into signal processing, speech, and audio, and exploring the intersection of these domains with language models, generative AI, and more.
We hope you’ll visit the Google booth (B1) to chat with researchers who are actively pursuing the latest innovations in signal processing, and check out some of the scheduled booth activities (including, demos and Q&A sessions). Follow the @GoogleAI X (Twitter) and LinkedIn accounts to get the latest updates about Google booth activities at ICASSP 2025.
Take a look below to learn more about our research being presented at the conference (Google affiliations in bold). Note that all session times are listed in IST.
Demos and Q&A at the Google Booth
This schedule is subject to change. Please visit the Google booth (B1) for more information.
Tues April 8 | 11:30 AM
Considerations for the Implementation of Audio Algorithms on Edge DevicesArpit Jain
Tues April 8 | 3:30 PM
Speech Recognition With LLMs Adapted to Disordered Speech Using Reinforcement LearningSubhashini Venugopalan
Wed April 9 | 11:15 AM
Unlock Better Speech Recognition with Project EuphoniaSubhashini Venugopalan
Lectures
Workshops
Accepted papers
Mamba Fusion: Learning Actions Through Questioning
Apoorva Beedu, Zhikang Dong, Jason S Sheinkopf, Irfan Essa
Text Descriptions of Actions and Objects Improve Action Anticipation
Apoorva Beedu, Harish Haresamudram, Irfan Essa
Towards Sub-Millisecond Latency Real-Time Speech Enhancement Models on Hearables
Artem Dementyev, Chandan K. A. Reddy, Scott Wisdom, Navin Chatlani, Richard Lyon, John R. Hershey
Bone Conducted Signal Guided Speech Enhancement for Voice Assistant on Earbuds
Jens Heitkaemper, Joe Caroselli, Max McKinnon, Arun Narayanan, Nathan Howard
Impairments Are Clustered in Latents of Deep Neural Network-Based Speech Quality Models
Fredrik Cumlin, Xinyu Liang, Victor Ungureanu, Chandan K. A. Reddy, Christian Schuldt, Saikat Chatterjee
An Ensemble Approach to Short-Form Video Quality Assessment Using Multimodal LLM
Wen Wen*, Yilin Wang, Neil Birkbeck, Balu Adsumilli
Speech Few-Shot Learning for Language Learners’ Speech Recognition
Jian Cheng, Sam Nguyen
Speech Re-painting for Robust ASR
Kyle Kastner, Gary Wang, Isaac Elias, Takaaki Saeki, Pedro Moreno Mengibar, Françoise Beaufays, Andrew Rosenberg, Bhuvana Ramabhadran
Span Attention for Entity-Consistent Task-Oriented Dialogue Response Generation
Jiale Chen, Xuelian Dong, Wenxiu Xie, Tao Gong, Fu Lee Wang, Tianyong Hao
SimulTron: On-Device Simultaneous Speech to Speech Translation
Alex Agranovich, Eliya Nachmani, Oleg Rybakov, Yifan Ding, Ye Jia, Nadav Bar, Heiga Zen, Michelle Tadmor Ramanovich
Full-Reference Point Cloud Quality Assessment with Multimodal Large Language Models
Ryosuke Watanabe, Tomoaki Konno, Hiroshi Sankoh, Bryan Tanaka, Tatsuya Kobayashi
Diff4Steer: Steerable Diffusion Prior for Generative Music Retrieval with Semantic Guidance
Xuchan Bao*, Judith Yue Li, Zhong Yi Wan, Kun Su, Timo Denk, Joonseok Lee, Dima Kuzmin, Fei Sha
Identifying and Mitigating Mismatched Language Code in Multilingual ASR
Jaeyoung Kim, Sepand Mavandadi, Kartik Audhkhasi, Shikhar Bharadwaj*, Brian Farris, Tongzhou Chen, Bhuvana Ramabhadran, Sriram Ganapathy
Audio Diffusion with Large Language Models
Yinghui Huang, Kyle Kastner, Kartik Audhkhasi, Bhuvana Ramabhadran, Andrew Rosenberg
Weak-to-Strong Generalization in Speech Recognition
Soheil Khorram, Qian Zhang, Rohit Prabhavalkar, Kartik Audhkhasi, Bhuvana Ramabhadran
Personalizing Keyword Spotting with Speaker Information
Beltrán Labrador, Pai Zhu, Guanlong Zhao, Angelo Scorza Scarpati, Quan Wang, Alicia Lozano-Diez, Ignacio Lopez-Moreno
Towards a Single ASR Model That Generalizes to Disordered Speech
Jimmy Tobin, Katrin Tomanek, Subhashini Venugopalan
Committees & Fellows
-
Bhuvana Ramabhadran
- Plenary Chair
-
Arun Narayanan
- Area Chair
-
Rohit Prabhavalkar
- Area Chair
-
Heiga Zen
- IEEE Fellow
-
Liangliang Cao
- IEEE Fellow