Google at ICASSP 2024
Google at ICASSP 2024
Google is proud to be a Diamond Patron of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024), a premier annual conference, which is being held April 14, 2024 through April 19, 2024 in Seoul, Korea. Google has a strong presence at this year’s conference with over 30 accepted papers and active involvement in 3 workshops and tutorials. We look forward to sharing some of our extensive signal processing research and expanding our partnership with the broader research community.
We hope you’ll visit the Google booth to chat with researchers who are actively pursuing the latest innovations in signal processing, and check out some of the scheduled booth activities (e.g., demos and Q&A sessions). Visit the @GoogleAI X (Twitter) and LinkedIn accounts to learn more about Google booth activities at ICASSP 2024.
Take a look below to learn more about our research being presented at ICASSP 2024 (Google affiliations in bold). Note that all session times are listed in KST.
Quick links
Quick links
Board & Organizing Committee
-
Bhuvana Ramabhadran
- IEEE Committee & Session Co-Chair
-
John Apostolopoulos
- Industry Innovation Forum
-
Heiga Zen
- SP Grand Challenges & Session Co-Chair
-
Dmitriy Serdyuk
- Session Co-Chair
-
Qiong Hu
- Session Co-Chair
-
Weiran Wang
- Session Co-Chair
-
Shrikanth Narayanan
- Session Co-Chair
-
Scott Wisdom
- Session Co-Chair
-
Rohit Prabhavalkar
- Session Co-Chair
-
Hui Wan
- Session Co-Chair
-
Jan Skoglund
- Session Co-Chair
-
Kartik Audhkhasi
- Session Co-Chair
-
Roshan Sharma
- Session Co-Chair
Plenary & oral talks
-
Thu, Apr 18 | 3:10PM - 4:10PM, Auditorium (3F)
GenAI: Challenges and Opportunities for Signal ProcessingSpeaker: Johan Schalkwyk
-
Thu, Apr 18 | 3:10PM - 4:10PM, Auditorium (3F)
Multi Modal Large Language Models as the Path Towards Language InclusivityPlenary talk
Speaker: Johan Schalkwyk
-
Tue, Apr 16 | 5:10PM - 5:30PM, Room E2
SOUNDLOCD: An Efficient Conditional Discrete Contrastive Latent Diffusion Model for Text-to-Sound GenerationXinlei Niu, Jing Zhang, Christian Walder, Charles Patrick Martin
-
Wed, Apr 17 | 9:20AM - 9:40AM, Room 103
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed DataTakaaki Saeki*, Gary Wang, Nobuyuki Morioka, Isaac Elias, Kyle Kastner, Andrew Rosenberg, Bhuvana Ramabhadran, Heiga Zen, Françoise Beaufays, Hadar Shemtov
-
Thu, Apr 18 | 1:30PM - 1:50PM, Room 103
USM-Lite: Quantization and Sparsity Aware Fine-Tuning for Speech Recognition with Universal Speech ModelsShaojin Ding, David Qiu, David Rim, Yanzhang He, Oleg Rybakov, Bo Li, Rohit Prabhavalkar, Weiran Wang, Tara Sainath, Shivani Agrawal, Zhonglin Han, Jian Li, Amir Yazdanbakhsh
-
Thu, Apr 18 | 2:10PM - 2:30PM, Room 103
Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End ModelsRohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N Sainath, Pedro J Moreno
-
Fri, Apr 19 | 8:20AM - 8:40AM, Room 102
Conformers Is All You Need for Visual Speech RecognitionOscar Chang, Hank Liao, Dmitriy Serdyuk, Ankit Shah*, Olivier Siohan
-
Fri, Apr 19 | 1:50PM - 2:10PM, Room 104
Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive StudyW. Ronny Huang, Cyril Allauzen, Tongzhou Chen, Kilol Gupta, Ke Hu, James Qin, Yu Zhang, Yongqiang Wang, Shuo-Yiin Chang, Tara N. Sainath
Accepted papers
Unintended Memorization in Large ASR Models, and How to Mitigate It
Lun Wang, Om Thakkar, Rajiv Mathews
Retrieval Augmented End-to-End Spoken Dialog Models
Mingqiu Wang, Izhak Shafran, Hagen Soltau, Wei Han, Yuan Cao, Dian Yu, Laurent El Shafey
Improving Acoustic Echo Cancellation for Voice Assistants Using Neural Echo Suppression and Multi-Microphone Noise
Jens Heitkaemper, Arun Narayanan, Turaj Zakizadeh Shabestary, Sankaran Panchapagesan, James Walker, Bhalchandra Gajare, Shlomi Regev, Ajay Dudani, Alexander Gruenstein
Large Scale Self-Supervised Pre-training for Active Speaker Detection
Otavio Braga, Wei Xia, Keith Johnson, Alice Chuang, Yunfan Ye, Olivier Siohan, Tuan Nguyen
Unsupervised Multi-Channel Separation and Adaptation
Cong Han*, Kevin Wilson, Scott Wisdom, John Hershey
StreamVC: Real-Time Low-Latency Voice Conversion
Yang Yang, Yury Kartynnik, Yunpeng Li, Jiuqiang Tang, Xing Li, George Sung, Matthias Grundmann
Quantifying the Effect of Simulator-Based Data Augmentation for Speech Recognition on Augmented Reality Glasses
Riku Arakawa, Mathieu Parvaix, Chiong Lai, Hakan Erdogan, Alex Olwal
Task Vector Algebra for ASR Models
Gowtham Ramesh*, Kartik Audhkhasi, Bhuvana Ramabhadran
Nomad: Unsupervised Learning of Perceptual Embeddings for Speech Enhancement and Non-matching Reference Audio Quality Assessment
Alessandro Ragano, Jan Skoglund, Andrew Hines
Augmenting Conformers with Structured State-Space Sequence Models for Online Speech Recognition
Haozhe Shan*, Albert Gu, Zhong Meng, Weiran Wang, Krzysztof Choromanski, Tara Sainath
Binaural Angular Separation Network
Yang Yang, George Sung, Shao-Fu Shih, Hakan Erdogan, Chehung Lee, Matthias Grundmann
Improving Speech Recognition for African American English with Audio Classification
Shefali Garg, Zhouyuan Huo, Khe Chai Sim, Suzan Schwartz, Mason Chua, Alëna Aksënova, Tsendsuren Munkhdalai, Levi King, Darryl Wright, Zion Mengesha, Dongseong Hwang, Tara Sainath, Françoise Beaufays, Pedro Moreno Mengibar
Monte Carlo Self-Training for Speech Recognition
Anshuman Tripathi, Soheil Khorram, Han Lu, Jaeyoung Kim, Qian Zhang, Hasim Sak
T-Pixel2Mesh: Combining Global and Local Transformer for 3D Mesh Generation from a Single Image
Shijie Zhang, Boyan Jiang, Keke He, Junwei Zhu, Ying Tai, Chengjie Wang, Yinda Zhang, Yanwei Fu
Large Language Models as a Proxy for Human Evaluation in Assessing the Comprehensibility of Disordered Speech Transcription
Katrin Tomanek, Jimmy Tobin, Subhashini Venugopalan, Richard Cave, Katie Seaver, Jordan R. Green, Rus Heywood
USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models
Guanlong Zhao, Yongqiang Wang*, Jason Pelecanos, Yu Zhang*, Hank Liao, Yiling Huang, Han Lu, Quan Wang
Multimodal Modeling for Spoken Language Identification
Shikhar Bharadwaj, Min Ma, Shikhar Vashishth, Ankur Bapna, Sriram Ganapathy, Vera Axelrod, Siddharth Dalmia, Wei Han, Yu Zhang, Daan van Esch, Sandy Ritchie, Partha Talukdar, Jason Riesa
Efficient Adapter Fine-Tuning for Tail Languages in Streaming Multilingual ASR
Junwen Bai, Bo Li, Qiujia Li, Tara Sainath, Trevor Strohman
TRANSLATOTRON 3: Speech to Speech Translation with Monolingual Data
Eliya Nachmani, Alon Levkovitch*, Yifan Ding, Chulayuth Asawaroengchai, Heiga Zen, Michelle Tadmor Ramanovich
A Comparison of Parameter-Efficient ASR Domain Adaptation Methods for Universal Speech and Language Models
Khe Chai Sim, Zhouyuan Huo, Tsendsuren Munkhdalai, Nikhil Siddhartha, Adam Stooke, Zhong Meng, Bo Li, Tara Sainath
CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds
David Budaghyan, Charles Onu, Arsenii Gorin, Cem Subakan, Doina Precup
Noise Masking Attacks and Defenses for Pre-Trained Speech Models
Matthew Jagielski, Om Thakkar, Lun Wang
Efficient Learned Image Compression with Selective Kernel Residual Module and Channel-wise Causal Context Model
Haisheng Fu, Feng Liang, Jie Liang, Zhenman Fang, Guohe Zhang, Jingning Han
Fedaqt: Accurate Quantized Training with Federated Learning
Renkun Ni, Yonghui Xiao, Phoenix Meadowlark, Oleg Rybakov, Tom Goldstein, Ananda Theertha Suresh, Ignacio Lopez Moreno, Mingqing Chen, Rajiv Mathews
Tutorials and workshops
-
Mon, Apr 15 | 8:30AM - 12:30PM, Room 103
Fundamentals of Transformers: A Signal-processing ViewChristos Thrampoulidis, Samet Oymak, Ankit Singh Rawat, Mahdi Soltanolkotabi
-
Mon, Apr 15 | 2:00PM - 5:30PM, Room 103
Foundational Problems in Neural Speech RecognitionEhsan Variani, Georg Heigold, Ke Wu, Michael Riley
-
Mon, Apr 15 | 8:30AM - 5:30PM, Room 205
Explainable AI for Speech and Audio (XAI-SA)Speaker: Ethan Manilow
Award recipients
-
The IEEE Signal Processing Society's Best Paper Award
Award recipients: Hossein Talebi & Peyman Milanfar
for "NIMA: Neural Image Assessment"
-
The IEEE Signal Processing Society's Claude Shannon-Harry Nyquist Award
Award recipient: Shrikanth Narayanan
"for contributions to spoken language processing technologies and their societal applications"
Google Research Booth Demo/Q&A Schedule
-
Tuesday, April 16 | 4:10PM - 4:30PM
Google Project Relatean Android communication tool for people with non-standard speech
Demo presenter: Katrin Tomanek
-
Wednesday, April 17 | 10:20AM - 10:40AM
Real-time On-device Voice ConversionDemo presenters: Yang Yang, Yury Kartynnik, Shao-Fu Shih, George Sung
-
Wednesday, April 17 | 4:10PM - 4:30PM
Contrastive Neural Audio SeparationDemo presenters: Yang Yang, Shao-Fu Shih, George Sung
-
Thursday, April 18 | 10:20AM - 10:40AM
Contrastive Neural Audio SeparationDemo presenters: Yang Yang, Shao-Fu Shih, George Sung
-
Friday, April 19 | 10:20AM - 10:40AM
Q&A on the Future of Video Communication in the Age of GenAIQ&A presenter: John Apostolopoulos
* Indicates work done while at Google