Jump to Content
Sagar Savla

Sagar Savla

Sagar leads a team working on utilizing Machine Learning for Image Understanding, Computational Photography, Responsible AI and Sound Understanding. His expansive work include , Live Transcribe, the largest accessibility app by Google; Pixel phone's camera, Nest Hub Max's smart camera features, and ML Fairness with the MST Scale.

His previous research was in ML and HCI at Georgia Institute of Technology; with industry research at FAIR, Paypal and Lyft.

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Understanding speech in the presence of noise with hearing aids can be challenging. Here we describe our entry, submission E003, to the 2021 Clarity Enhancement Challenge Round1 (CEC1), a machine learning challenge for improving hearing aid processing. We apply and evaluate a deep neural network speech enhancement model with a low-latency recursive least squares (RLS) adaptive beamformer, and a linear equalizer, to improve speech intelligibility in the presence of speech or noise interferers. The enhancement network is trained only on the CEC1 data, and all processing obeys the 5 ms latency requirement. We quantify the improvement using the CEC1 provided hearing loss model and Modified Binaural Short-Time Objective Intelligibility (MBSTOI) score (ranging from 0 to 1, higher being better). On the CEC1 test set, we achieve a mean of 0.644 and median of 0.652 compared to the 0.310 mean and 0.314 median for the baseline. In the CEC1 subjective listener intelligibility assessment, for scenes with noise interferers, we achieve the second highest improvement in intelligibility from 33.2% to 85.5%, but for speech interferers, we see more mixed results, potentially from listener confusion. View details
    Preview abstract A range of new technologies have the potential to help people, whether traditionally considered hearing impaired or not. These technologies include more sophisticated personal sound amplification products, as well as real-time speech enhancement and speech recognition. They can improve user’s communication abilities, but these new approaches require new ways to describe their success and allow engineers to optimize their properties. Speech recognition systems are often optimized using the word-error rate, but when the results are presented in real time, user interface issues become a lot more important than conventional measures of auditory performance. For example, there is a tradeoff between minimizing recognition time (latency) by quickly displaying results versus disturbing the user’s cognitive flow by rewriting the results on the screen when the recognizer later needs to change its decisions. This article describes current, new, and future directions for helping billions of people with their hearing. These new technologies bring auditory assistance to new users, especially to those in areas of the world without access to professional medical expertise. In the short term, audio enhancement technologies in inexpensive mobile forms, devices that are quickly becoming necessary to navigate all aspects of our lives, can bring better audio signals to many people. Alternatively, current speech recognition technology may obviate the need for audio amplification or enhancement at all and could be useful for listeners with normal hearing or with hearing loss. With new and dramatically better technology based on deep neural networks, speech enhancement improves the signal to noise ratio, and audio classifiers can recognize sounds in the user’s environment. Both use deep neural networks to improve a user’s experiences. Longer term, auditory attention decoding is expected to allow our devices to understand where a user is directing their attention and thus allow our devices to respond better to their needs. In all these cases, the technologies turn the hearing assistance problem on its head, and thus require new ways to measure their performance. View details
    No Results Found