Google at ICASSP 2024

April 14, 2024 to April 19, 2024 • Seoul, Korea

Google at ICASSP 2024

Google is proud to be a Diamond Patron of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024), a premier annual conference, which is being held April 14, 2024 through April 19, 2024 in Seoul, Korea. Google has a strong presence at this year’s conference with over 30 accepted papers and active involvement in 3 workshops and tutorials. We look forward to sharing some of our extensive signal processing research and expanding our partnership with the broader research community.

We hope you’ll visit the Google booth to chat with researchers who are actively pursuing the latest innovations in signal processing, and check out some of the scheduled booth activities (e.g., demos and Q&A sessions). Visit the @GoogleAI X (Twitter) and LinkedIn accounts to learn more about Google booth activities at ICASSP 2024.

Take a look below to learn more about our research being presented at ICASSP 2024 (Google affiliations in bold). Note that all session times are listed in KST.

Quick links

Board & Organizing Committee

Bhuvana Ramabhadran
- IEEE Committee & Session Co-Chair
John Apostolopoulos
- Industry Innovation Forum
Heiga Zen
- SP Grand Challenges & Session Co-Chair
Dmitriy Serdyuk
- Session Co-Chair
Qiong Hu
- Session Co-Chair
Weiran Wang
- Session Co-Chair
Shrikanth Narayanan
- Session Co-Chair
Scott Wisdom
- Session Co-Chair
Rohit Prabhavalkar
- Session Co-Chair
Hui Wan
- Session Co-Chair
Jan Skoglund
- Session Co-Chair
Kartik Audhkhasi
- Session Co-Chair
Roshan Sharma
- Session Co-Chair

Plenary & oral talks

Accepted papers

Unintended Memorization in Large ASR Models, and How to Mitigate It
Lun Wang, Om Thakkar, Rajiv Mathews

Retrieval Augmented End-to-End Spoken Dialog Models
Mingqiu Wang, Izhak Shafran, Hagen Soltau, Wei Han, Yuan Cao, Dian Yu, Laurent El Shafey

Improving Acoustic Echo Cancellation for Voice Assistants Using Neural Echo Suppression and Multi-Microphone Noise
Jens Heitkaemper, Arun Narayanan, Turaj Zakizadeh Shabestary, Sankaran Panchapagesan, James Walker, Bhalchandra Gajare, Shlomi Regev, Ajay Dudani, Alexander Gruenstein

Large Scale Self-Supervised Pre-training for Active Speaker Detection
Otavio Braga, Wei Xia, Keith Johnson, Alice Chuang, Yunfan Ye, Olivier Siohan, Tuan Nguyen

Unsupervised Multi-Channel Separation and Adaptation
Cong Han*, Kevin Wilson, Scott Wisdom, John Hershey

StreamVC: Real-Time Low-Latency Voice Conversion
Yang Yang, Yury Kartynnik, Yunpeng Li, Jiuqiang Tang, Xing Li, George Sung, Matthias Grundmann

Quantifying the Effect of Simulator-Based Data Augmentation for Speech Recognition on Augmented Reality Glasses
Riku Arakawa, Mathieu Parvaix, Chiong Lai, Hakan Erdogan, Alex Olwal

Task Vector Algebra for ASR Models
Gowtham Ramesh*, Kartik Audhkhasi, Bhuvana Ramabhadran

Nomad: Unsupervised Learning of Perceptual Embeddings for Speech Enhancement and Non-matching Reference Audio Quality Assessment
Alessandro Ragano, Jan Skoglund, Andrew Hines

Augmenting Conformers with Structured State-Space Sequence Models for Online Speech Recognition
Haozhe Shan*, Albert Gu, Zhong Meng, Weiran Wang, Krzysztof Choromanski, Tara Sainath

Binaural Angular Separation Network
Yang Yang, George Sung, Shao-Fu Shih, Hakan Erdogan, Chehung Lee, Matthias Grundmann

Improving Speech Recognition for African American English with Audio Classification
Shefali Garg, Zhouyuan Huo, Khe Chai Sim, Suzan Schwartz, Mason Chua, Alëna Aksënova, Tsendsuren Munkhdalai, Levi King, Darryl Wright, Zion Mengesha, Dongseong Hwang, Tara Sainath, Françoise Beaufays, Pedro Moreno Mengibar

Monte Carlo Self-Training for Speech Recognition
Anshuman Tripathi, Soheil Khorram, Han Lu, Jaeyoung Kim, Qian Zhang, Hasim Sak

T-Pixel2Mesh: Combining Global and Local Transformer for 3D Mesh Generation from a Single Image
Shijie Zhang, Boyan Jiang, Keke He, Junwei Zhu, Ying Tai, Chengjie Wang, Yinda Zhang, Yanwei Fu

Large Language Models as a Proxy for Human Evaluation in Assessing the Comprehensibility of Disordered Speech Transcription
Katrin Tomanek, Jimmy Tobin, Subhashini Venugopalan, Richard Cave, Katie Seaver, Jordan R. Green, Rus Heywood

USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models
Guanlong Zhao, Yongqiang Wang*, Jason Pelecanos, Yu Zhang*, Hank Liao, Yiling Huang, Han Lu, Quan Wang

Multimodal Modeling for Spoken Language Identification
Shikhar Bharadwaj, Min Ma, Shikhar Vashishth, Ankur Bapna, Sriram Ganapathy, Vera Axelrod, Siddharth Dalmia, Wei Han, Yu Zhang, Daan van Esch, Sandy Ritchie, Partha Talukdar, Jason Riesa

Efficient Adapter Fine-Tuning for Tail Languages in Streaming Multilingual ASR
Junwen Bai, Bo Li, Qiujia Li, Tara Sainath, Trevor Strohman

TRANSLATOTRON 3: Speech to Speech Translation with Monolingual Data
Eliya Nachmani, Alon Levkovitch*, Yifan Ding, Chulayuth Asawaroengchai, Heiga Zen, Michelle Tadmor Ramanovich

A Comparison of Parameter-Efficient ASR Domain Adaptation Methods for Universal Speech and Language Models
Khe Chai Sim, Zhouyuan Huo, Tsendsuren Munkhdalai, Nikhil Siddhartha, Adam Stooke, Zhong Meng, Bo Li, Tara Sainath

CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds
David Budaghyan, Charles Onu, Arsenii Gorin, Cem Subakan, Doina Precup

Noise Masking Attacks and Defenses for Pre-Trained Speech Models
Matthew Jagielski, Om Thakkar, Lun Wang

Efficient Learned Image Compression with Selective Kernel Residual Module and Channel-wise Causal Context Model
Haisheng Fu, Feng Liang, Jie Liang, Zhenman Fang, Guohe Zhang, Jingning Han

Fedaqt: Accurate Quantized Training with Federated Learning
Renkun Ni, Yonghui Xiao, Phoenix Meadowlark, Oleg Rybakov, Tom Goldstein, Ananda Theertha Suresh, Ignacio Lopez Moreno, Mingqing Chen, Rajiv Mathews

Tutorials and workshops

Award recipients

Google Research Booth Demo/Q&A Schedule

Tuesday, April 16 | 4:10PM - 4:30PM
Google Project Relate

an Android communication tool for people with non-standard speech
Demo presenter: Katrin Tomanek
Wednesday, April 17 | 10:20AM - 10:40AM
Real-time On-device Voice Conversion

Demo presenters: Yang Yang, Yury Kartynnik, Shao-Fu Shih, George Sung
Wednesday, April 17 | 4:10PM - 4:30PM
Contrastive Neural Audio Separation

Demo presenters: Yang Yang, Shao-Fu Shih, George Sung

See blog post
Thursday, April 18 | 10:20AM - 10:40AM
Contrastive Neural Audio Separation

Demo presenters: Yang Yang, Shao-Fu Shih, George Sung

See blog post
Friday, April 19 | 10:20AM - 10:40AM
Q&A on the Future of Video Communication in the Age of GenAI

Q&A presenter: John Apostolopoulos

* Indicates work done while at Google

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Google at ICASSP 2024