Publications

Binamix -- A Python Library for Generating Binaural Audio Datasets

Dan Barry

Davoud Shariat Panah

Alessandro Ragano

Jan Skoglund

Andrew Hines

AES 158th Audio Engineering Society Convention (2025)

On the Design of the Binaural Rendering Library for Eclipsa Audio Immersive Audio Container

Tomasz Rudzki

Gavin Kearney

Jan Skoglund

AES 158th Convention of the Audio Engineering Society (2025)

Perceptual Evaluation of a Mix Presentation for Immersive Audio with IAMF

Carlos Tejeda-Ocampo

Toni Hirvonen

Ema Souza-Blanes

Mahmoud Namazi

Jan Skoglund

AES 158th Convention of the Audio Engineering Society (2025)

Project Euphonia: Advancing Inclusive Speech Recognition through Expanded Data Collection and Evaluation

Alicia Martín

Bob MacDonald

Julie Cattiau

Pan-Pan Jiang

Jimmy Tobin

Philip Q Nelson

Katrin Tomanek

Frontiers in Language Sciences (2025)

A Novel CI Coding Strategy Based on a Cochlear Model and Deep Neural Network

Maryam Hosseini

Tim Brochier

Zachary Smith

Brett Swanson

Andrew Vandali

Alan Kan

Fadwa Alnafjan

Kat Fernandez

Richard F. Lyon

Conference on Implantable Auditory Prostheses 2025

Multimodal Modeling for Spoken Language Identification

Shikhar Bharadwaj

Min Ma

Shikhar Vashishth

Ankur Bapna

Sriram (Sri) Ganapathy

Vera Axelrod

Sid Dalmia

Wei Han

Yu Zhang

Daan van Esch

Sandy Ritchie

Partha Talukdar

Jason Riesa

Proceedings of 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024) (2024)

StreamVC: Real-Time Low-Latency Voice Conversion

Yang Yang

Yury Kartynnik

Pen Li

Jiuqiang Tang

Xing Li

George Sung

Matthias Grundmann

ICASSP 2024 (2024)

NOMAD: Unsupervised Learning of Perceptual Embeddings for Speech Enhancement and Non-matching Reference Audio Quality Assessment

Alessandro Ragano

Andrew Hines

Jan Skoglund

ICASSP 2024 (to appear)

USM-SCD: USM-Based Multilingual Speaker Change Detection

Guanlong Zhao

Yongqiang Wang

Jason Pelecanos

Yu Zhang

Hank Liao

Yiling Huang

Han Lu

Quan Wang

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 11801-11805

Now You See Me, Now You Don't: 'Poverty of the Stimulus' Problems and Arbitrary Correspondences in End-to-End Speech Models

Daan van Esch

Proceedings of the Second Workshop on Computation and Written Language (CAWL) 2024

Automatic Speech Recognition of Conversational Speech in Individuals with Disordered Speech

Jimmy Tobin

Philip Q Nelson

Bob MacDonald

Rus Heywood

Richard Cave

Katie Seaver

Antoine Desjardins

Pan-Pan Jiang

Jordan Green

Journal of Speech, Language, and Hearing Research (2024) (to appear)

Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM

Eliya Nachmani

Alon Levkovitch

Roy Hirsch

Julian Salazar

Chulayuth Asawaroengchai

Soroosh Mariooryad

Ehud Rivlin

RJ Skerry-Ryan

Michelle Tadmor Ramanovich

ICLR (2024)

A Study of Raters' Sensitivity to Inter-sentence Pause Durations in American English Speech

Paul Owoicho

Josh Camp

Tom Kenter

Speech Prosody 2024 (SP2024) (2024) (to appear)

Data processing for Japanese text-to-pronunciation models

Gleb Mazovetskiy

Taku Kudo

(2024)

Translatotron 3: Speech to Speech Translation with Monolingual Data

Eliya Nachmani

Alon Levkovitch

Yifan Ding

Chulayuth Asawaroengchai

Heiga Zen

Michelle Tadmor Ramanovich

2024

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs

Publications

Filter by:

Year

Team

Research Area

Meet the teams driving innovation