Richard F. Lyon
Dick Lyon, author of the 2017 book Human and Machine Hearing:
Extracting Meaning from Sound, has a long history of research and invention, including the optical mouse, speech and handwriting recognition, computational models of hearing, and color photographic imaging. At Google he worked on Street View camera systems, and is now focused on machine hearing technology and applications.
Authored Publications
Sort By
Tight bounds for the median of a gamma distribution
PLOS One (2023)
Preview abstract
The median of a standard gamma distribution, as a function of its shape parameter $k$, has no known representation in terms of elementary functions. In this work we prove the tightest upper and lower bounds of the form $2^{-1/k} (A + k)$: an upper bound with $A = e^{-\gamma}$ that is tight for low $k$ and a lower bound with $A = \log(2) - \frac{1}{3}$ that is tight for high $k$. These bounds are valid over the entire domain of $k > 0$, staying between 48 and 55 percentile. We derive and prove several other new tight bounds in support of the proofs.
View details
Preview abstract
Understanding speech in the presence of noise with hearing aids can be challenging. Here we describe our entry, submission E003, to the 2021 Clarity Enhancement Challenge Round1 (CEC1), a machine learning challenge for improving hearing aid processing. We apply and evaluate a deep neural network speech enhancement model with a low-latency recursive least squares (RLS) adaptive beamformer, and a linear equalizer, to improve speech intelligibility in the presence of speech or noise interferers. The enhancement network is trained only on the CEC1 data, and all processing obeys the 5 ms latency requirement. We quantify the improvement using the CEC1 provided hearing loss model and Modified Binaural Short-Time Objective Intelligibility (MBSTOI) score (ranging from 0 to 1, higher being better). On the CEC1 test set, we achieve a mean of 0.644 and median of 0.652 compared to the 0.310 mean and 0.314 median for the baseline. In the CEC1 subjective listener intelligibility assessment, for scenes with noise interferers, we achieve the second highest improvement in intelligibility from 33.2% to 85.5%, but for speech interferers, we see more mixed results, potentially from listener confusion.
View details
VHP: Vibrotactile Haptics Platform for On-body Applications
Dimitri Kanevsky
Malcolm Slaney
UIST, ACM, https://dl.acm.org/doi/10.1145/3472749.3474772 (2021)
Preview abstract
Wearable vibrotactile devices have many potential applications, including novel interfaces and sensory substitution for accessibility. Currently, vibrotactile experimentation is done using large lab setups. However, most practical applications require standalone on-body devices and integration into small form factors. Such integration is time-consuming and requires expertise.
To democratize wearable haptics we introduce VHP, a vibrotactile haptics platform. It comprises a low-power, miniature electronics board that can drive up to 12 independent channels of haptic signals with arbitrary waveforms at 2 kHz. The platform can drive vibrotactile actuators including LRAs and voice coils. Each vibrotactile channel has current-based load sensing, thus allowing for self-testing and auto-adjustment. The hardware is battery powered, programmable, has multiple input options, including serial and Bluetooth, as well as the ability to synthesize haptic signals internally. We conduct technical evaluations to determine the power consumption, latency, and how number of actuators that can run simultaneously.
We demonstrate applications where we integrate the platform into a bracelet and a sleeve to provide an audio-to-tactile wearable interface. To facilitate more use of this platform, we open-source our design and partner with a distributor to make the hardware widely available. We hope this work will motivate the use and study of vibrotactile all-day wearable devices.
View details
Haptics with Input: Back-EMF in Linear Resonant Actuators to Enable Touch, Pressure and Environmental Awareness.
Proceedings of UIST 2020 (ACM Symposium on User Interface Software and Technology), ACM, New York, NY
Preview abstract
Today’s wearable and mobile devices typically use separate hardware components for sensing and actuation. In this work, we introduce new opportunities for the Linear Resonant Actuator (LRA), which is ubiquitous in such devices due to its capability for providing rich haptic feedback. By leveraging strategies to enable active and passive sensing capabilities with LRAs, we demonstrate their benefits and potential as self-contained I/O devices. Specifically, we use the back-EMF voltage to classify if the LRA is tapped, touched, as well as how much pressure is being applied. The back-EMF sensing is already integrated into many motor and LRA drivers. We developed a passive low-power tap sensing method that uses just 37.7 uA. Furthermore, we developed active touch and pressure sensing, which is low-power, quiet (2 dB), and minimizes vibration. The sensing method works with many types of LRAs. We show applications, such as pressure-sensing side-buttons on a mobile phone. We have also implemented our technique directly on an existing mobile phone’s LRA to detect if the phone is handheld or placed on a soft or hard surface. Finally, we show that this method can be used for haptic devices to determine if the LRA makes good contact with the skin. Our approach can add rich sensing capabilities to the ubiquitous LRA actuators without requiring additional sensors or hardware.
View details
Ecological Auditory Measures for the Next Billion Users
Brian Kemler
Chet Gnegy
Dimitri Kanevsky
Malcolm Slaney
Ear and Hearing (2020)
Preview abstract
A range of new technologies have the potential to help people, whether traditionally considered hearing impaired or not. These technologies include more sophisticated personal sound amplification products, as well as real-time speech enhancement and speech recognition. They can improve user’s communication abilities, but these new approaches require new ways to describe their success and allow engineers to optimize their properties. Speech recognition systems are often optimized using the word-error rate, but when the results are presented in real time, user interface issues become a lot more important than conventional measures of auditory performance. For example, there is a tradeoff between minimizing recognition time (latency) by quickly displaying results versus disturbing the user’s cognitive flow by rewriting the results on the screen when the recognizer later needs to change its decisions. This article describes current, new, and future directions for helping billions of people with their hearing. These new technologies bring auditory assistance to new users, especially to those in areas of the world without access to professional medical expertise. In the short term, audio enhancement technologies in inexpensive mobile forms, devices that are quickly becoming necessary to navigate all aspects of our lives, can bring better audio signals to many people. Alternatively, current speech recognition technology may obviate the need for audio amplification or enhancement at all and could be useful for listeners with normal hearing or with hearing loss. With new and dramatically better technology based on deep neural networks, speech enhancement improves the signal to noise ratio, and audio classifiers can recognize sounds in the user’s environment. Both use deep neural networks to improve a user’s experiences. Longer term, auditory attention decoding is expected to allow our devices to understand where a user is directing their attention and thus allow our devices to respond better to their needs. In all these cases, the technologies turn the hearing assistance problem on its head, and thus require new ways to measure their performance.
View details
EXPLORING TRADEOFFS IN MODELS FOR LOW-LATENCY SPEECH ENHANCEMENT
Jeremy Thorpe
Michael Chinen
Proceedings of the 16th International Workshop on Acoustic Signal Enhancement (2018)
Preview abstract
We explore a variety of configurations of neural networks for one- and
two-channel spectrogram-mask-based speech enhancement. Our best model improves on
state-of-the-art performance on the CHiME2 speech enhancement task.
We examine trade-offs among non-causal lookahead, compute work, and parameter count versus enhancement performance and find that zero-lookahead models can achieve, on average, only 0.5 dB worse performance than our best bidirectional model. Further, we find that 200 milliseconds of lookahead is sufficient to achieve performance within about 0.2 dB from our best bidirectional model.
View details
Preview abstract
The cascade of asymmetric resonators with fast-acting compression (CARFAC) is a cascade filterbank model that performed well in a comparative study of cochlear models, but exhibited two anomalies in its frequency response and excitation pattern. It is shown here that the underlying reason is CARFAC's inclusion of quadratic distortion, which generates DC and low-frequency components that in a real cochlea would be canceled by reflections at the helicotrema, but since cascade filterbanks lack the reflection mechanism, these low-frequency components cause the observed anomalies. The simulations demonstrate that the anomalies disappear when the model's quadratic distortion parameter is zeroed, while other successful features of the model remain intact.
View details
Trainable Frontend For Robust and Far-Field Keyword Spotting
Yuxuan Wang
Thad Hughes
Proc. IEEE ICASSP 2017, New Orleans, LA
Preview abstract
Robust and far-field speech recognition is critical to enable true hands-free communication. In far-field conditions, signals are attenuated due to distance. To improve robustness to loudness variation, we introduce a novel frontend called per-channel energy normalization (PCEN). The key ingredient of PCEN is the use of an automatic gain control based dynamic compression to replace the widely used static (such as log or root) compression. We evaluate PCEN on the keyword spotting task. On our large rerecorded noisy and far-field eval sets, we show that PCEN significantly improves recognition performance. Furthermore, we model PCEN as neural network layers and optimize high-dimensional PCEN parameters jointly with the keyword spotting acoustic model. The trained PCEN frontend demonstrates significant further improvements without increasing model complexity or inference-time cost.
View details
Human and Machine Hearing: Extracting Meaning from Sound
Cambridge University Press (2017)
Preview abstract
Human and Machine Hearing is the first book to comprehensively describe how human hearing works and how to build machines to analyze sounds in the same way that people do. Drawing on over thirty-five years of experience in analyzing hearing and building systems, Richard F. Lyon explains how we can now build machines with close-to-human abilities in speech, music, and other sound-understanding domains. He explains human hearing in terms of engineering concepts, and describes how to incorporate those concepts into machines for a wide range of modern applications. The details of this approach are presented at an accessible level, to bring a diverse range of readers, from neuroscience to engineering, to a common technical understanding. The description of hearing as signal-processing algorithms is supported by corresponding open-source code, for which the book serves as motivating documentation.
View details
A 6 µW per Channel Analog Biomimetic Cochlear Implant Processor Filterbank Architecture With Across Channels AGC
Guang Wang
Emmanuel M. Drakakis
IEEE Transactions on Biomedical Circuits and Systems, 9 (2015), pp. 72-86
Preview abstract
A new analog cochlear implant processor filterbank architecture of increased biofidelity, enhanced across-channel contrast and very low power consumption has been designed and prototyped. Each channel implements a biomimetic, asymmetric bandpass-like One-Zero-Gammatone-Filter (OZGF) transfer function, using class-AB log-domain techniques. Each channel's quality factor and suppression are controlled by means of a new low power Automatic Gain Control (AGC) scheme which is coupled across the neighboring channels and emulates lateral inhibition (LI) phenomena in the auditory system. Detailed measurements from a five-channel silicon IC prototype fabricated in a 0.35 µm AMS technology confirm the operation of the coupled AGC scheme and its ability to enhance contrast among channel outputs. The prototype is characterized by an input dynamic range of 92 dB while consuming only 28 µW of power in total ~6 µW per channel) under a 1.8 V power supply. The architecture is well-suited for fully-implantable cochlear implants.
View details