Google at ICASSP 2025

April 6, 2025 to April 11, 2025 • Hyderabad, India

Google at ICASSP 2025

Google is proud to be a Diamond Patron of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025), a premier annual conference, which is being held April 6 through April 11, 2025 in Hyderabad, India. Researchers from across Google are actively engaged at the conference with over 20 accepted papers and involvement in several workshops and lectures. We look forward to expanding our partnership with the broader research community and sharing some of our extensive research into signal processing, speech, and audio, and exploring the intersection of these domains with language models, generative AI, and more.

We hope you’ll visit the Google booth (B1) to chat with researchers who are actively pursuing the latest innovations in signal processing, and check out some of the scheduled booth activities (including, demos and Q&A sessions). Follow the @GoogleAI X (Twitter) and LinkedIn accounts to get the latest updates about Google booth activities at ICASSP 2025.

Take a look below to learn more about our research being presented at the conference (Google affiliations in bold). Note that all session times are listed in IST.

Quick links

LinkedIn
X
- ×

Quick links

LinkedIn
X
- ×

Demos and Q&A at the Google Booth

This schedule is subject to change. Please visit the Google booth (B1) for more information.

Tues April 8 | 11:30 AM

Considerations for the Implementation of Audio Algorithms on Edge Devices

Arpit Jain

Tues April 8 | 3:30 PM

Speech Recognition With LLMs Adapted to Disordered Speech Using Reinforcement Learning

Subhashini Venugopalan

Wed April 9 | 11:15 AM

Unlock Better Speech Recognition with Project Euphonia

Subhashini Venugopalan

Lectures

Wed, Apr 9 | 8:30AM — 10:00AM, MRG.01 (Reinforcement learning I)

Speech Recognition with LLMs Adapted to Disordered Speech using Reinforcement Learning

Speakers include: Chirag Nagpal, Subhashini Venugopalan, Jimmy Tobin, Marilyn Ladewig, Katherine Heller, Katrin Tomanek

Read the paper

Thurs, Apr 10 | 9:45AM — 10:00AM, MRG.04 (Enhancing ASR with Large Language Models)

Predicting Compact Phrasal Rewrites with Large Language Models for ASR Post Editing

Speakers include: Hao Zhang, Felix Stahlberg, Shankar Kumar

Read the paper

Fri, Apr 11 | 2:00PM — 3:00PM, MR1.02 (50 years of Audio and Acoustic Signal Processing)

Perceptual Audio Coding: A 40-Year Historical Perspective

Speakers include: Jürgen Herre, Schuyler Quackenbush, Minje Kim, Jan Skoglund

Read the paper

Workshops

Mon, Apr 7 | 2:00PM — 5:30PM, MR1.04

SALMA: Speech and Audio Language Models – Architectures, Data Sources, and Training Paradigms

Invited Speaker: Bhuvana Ramabhadran

Link to Workshop

Sun, Apr 6 | 2:00PM — 5:30PM, MRG.05

Mapping Brain-Body-Behavior Signal Dynamics in Human Speech Production and Interaction

Organizer: Shri Narayanan

Link to Workshop

Mon, Apr 7 | 9:30AM — 5:30PM, MR2.02

Generative Data Augmentation for Real-World Signal Processing Applications (GenDA 2025)

Organizer: John Hershey

Link to Workshop

Accepted papers

Mamba Fusion: Learning Actions Through Questioning
Apoorva Beedu, Zhikang Dong, Jason S Sheinkopf, Irfan Essa

Text Descriptions of Actions and Objects Improve Action Anticipation
Apoorva Beedu, Harish Haresamudram, Irfan Essa

Towards Sub-Millisecond Latency Real-Time Speech Enhancement Models on Hearables
Artem Dementyev, Chandan K. A. Reddy, Scott Wisdom, Navin Chatlani, Richard Lyon, John R. Hershey

Bone Conducted Signal Guided Speech Enhancement for Voice Assistant on Earbuds
Jens Heitkaemper, Joe Caroselli, Max McKinnon, Arun Narayanan, Nathan Howard

Impairments Are Clustered in Latents of Deep Neural Network-Based Speech Quality Models
Fredrik Cumlin, Xinyu Liang, Victor Ungureanu, Chandan K. A. Reddy, Christian Schuldt, Saikat Chatterjee

An Ensemble Approach to Short-Form Video Quality Assessment Using Multimodal LLM
Wen Wen*, Yilin Wang, Neil Birkbeck, Balu Adsumilli

Speech Few-Shot Learning for Language Learners’ Speech Recognition
Jian Cheng, Sam Nguyen

Speech Re-painting for Robust ASR
Kyle Kastner, Gary Wang, Isaac Elias, Takaaki Saeki, Pedro Moreno Mengibar, Françoise Beaufays, Andrew Rosenberg, Bhuvana Ramabhadran

Span Attention for Entity-Consistent Task-Oriented Dialogue Response Generation
Jiale Chen, Xuelian Dong, Wenxiu Xie, Tao Gong, Fu Lee Wang, Tianyong Hao

SimulTron: On-Device Simultaneous Speech to Speech Translation
Alex Agranovich, Eliya Nachmani, Oleg Rybakov, Yifan Ding, Ye Jia, Nadav Bar, Heiga Zen, Michelle Tadmor Ramanovich

Full-Reference Point Cloud Quality Assessment with Multimodal Large Language Models
Ryosuke Watanabe, Tomoaki Konno, Hiroshi Sankoh, Bryan Tanaka, Tatsuya Kobayashi

Diff4Steer: Steerable Diffusion Prior for Generative Music Retrieval with Semantic Guidance
Xuchan Bao*, Judith Yue Li, Zhong Yi Wan, Kun Su, Timo Denk, Joonseok Lee, Dima Kuzmin, Fei Sha

Identifying and Mitigating Mismatched Language Code in Multilingual ASR
Jaeyoung Kim, Sepand Mavandadi, Kartik Audhkhasi, Shikhar Bharadwaj*, Brian Farris, Tongzhou Chen, Bhuvana Ramabhadran, Sriram Ganapathy

Audio Diffusion with Large Language Models
Yinghui Huang, Kyle Kastner, Kartik Audhkhasi, Bhuvana Ramabhadran, Andrew Rosenberg

Weak-to-Strong Generalization in Speech Recognition
Soheil Khorram, Qian Zhang, Rohit Prabhavalkar, Kartik Audhkhasi, Bhuvana Ramabhadran

Personalizing Keyword Spotting with Speaker Information
Beltrán Labrador, Pai Zhu, Guanlong Zhao, Angelo Scorza Scarpati, Quan Wang, Alicia Lozano-Diez, Ignacio Lopez-Moreno

Towards a Single ASR Model That Generalizes to Disordered Speech
Jimmy Tobin, Katrin Tomanek, Subhashini Venugopalan