SpeakFaster: Revolutionizing communication for people with severe motor impairments

November 19, 2024

Subhashini Venugopalan, Research Scientist, Google Research, and Shanqing Cai, Software Engineer, Google

We introduce SpeakFaster, a research prototype interface that uses large language models to accelerate eye-gaze based typing for users with ALS and report results from user studies.

For individuals who are unable to speak or type due to conditions like ALS, augmentative and alternative communication (AAC) devices and eye-gaze typing can provide essential support for communication. However, these resources suffer from slow text-entry speeds, hindering users’ ability to engage in spontaneous conversation and express themselves fully. Closing this speed gap for AAC devices can play a big role in improving the quality of life for many individuals.

Yet a major bottleneck to faster gaze typing for users with disabilities is the eye fatigue and temporal cost associated with performing many keystrokes. One way to address this bottleneck is to develop techniques to significantly reduce the number of keystrokes needed to enter text by predicting upcoming text from the preceding text and non-linguistic contextual signals.

In “Using Large Language Models to Accelerate Communication for Eye Gaze Typing Users with ALS,” published in Nature Communications, we capitalize on the capabilities of LLMs, rethinking strategies and user interfaces for enhanced text-entry for AAC users. The paper introduces SpeakFaster, a system that leverages fine-tuned LLMs and conversational context to expand highly-abbreviated English text (just word initials supplemented by additional letters and words when necessary) into the desired full phrases at very high accuracy. The SpeakFaster system was developed by a collaboration between Google Research and Team Gleason. Our initial user studies suggest that this co-designed user interface resulted in impressive motor savings, requiring 57% fewer motor actions than traditional predictive keyboards in offline simulation, and resulting in text-entry rates 29-60% faster than traditional baselines.

SpeakFaster: An AI-powered solution to bridging the gap between thought and text

The eye-gaze tracking technology often used to operate AAC devices works by identifying where a user is looking, and converting these movements to analogous computer mouse movements for typing and clicking. However, the precision required for each keystroke leads to a frustratingly slow text-entry speed, averaging only 8–10 words per minute. If you think about your last few digital conversations, it’s easy to understand how this slow speed is a significant barrier to having natural and engaging conversations, and it can be particularly frustrating for users, limiting their ability to participate fully. Both Brain Computer Interfaces (BCI) and AI offer immense potential to help users in this situation. While BCI is an invasive procedure that still needs more extensive evaluations, AI offers a more immediate non-invasive solution to tackle this challenge.

SpeakFaster offers an AI-based approach, integrating large language models (LLMs) with a novel user interface designed specifically for abbreviated text entry. Language models have long been used to power word completion and next-word prediction features in smart keyboard applications. Recent LLMs are capable of doing much more. For example, we previously demonstrated that a fine-tuned 64 billion–parameter Google LaMDA model can expand abbreviations of the word-initial form (e.g., “ishpitb”) into full phrases (e.g., “I saw him play in the bedroom”) with accuracies as high as 77% when provided with conversational context, i.e., the other speaker’s turn(s). SpeakFaster builds on this approach. Users start by typing the initials of the words in their intended phrase. Our fine-tuned LLMs (based on PaLM models powering SpeakFaster) then predict the entire phrase and display the most likely phrases based on those initial letters and the conversational context. If the desired phrase isn't among the options, users can refine the prediction by either spelling out keywords or selecting alternative words. This approach significantly reduces the number of keystrokes needed, leading to faster communication.

play silent looping video pause silent looping video

Screencast of the SpeakFaster user interface in action. The interface allows users to enter abbreviated text as input, and uses the context of the conversation to suggest responses in full sentence form.

Specifically, we designed SpeakFaster’s user interface to allow for easy abbreviation input and refinement, ensuring that users can always communicate their intended message even if the initial prediction isn't exactly what they wanted. To go hand-in-hand with the UI, we develop two fine-tuned LLMs as a complete, practical solution to power SpeakFaster. The first, “KeywordAE”, is capable of expanding abbreviations that mix initials with words that are fully or incompletely spelled out. The KeywordAE model is also capable of expanding initials-only abbreviations, and hence provides a superset of the capabilities of our previous work. Second, the “FillMask” model is capable of providing alternative words that begin with a given initial letter in the context of surrounding words. The two models were each fine-tuned with approximately 1.8 million unique triplets of {context, abbreviation, full phrase} synthesized from four public datasets of dialogues in English.

Key findings from user studies

In addition to simulation experiments, we conducted user studies to test the effectiveness of SpeakFaster. The studies involved both non-AAC and ALS eye-gaze users, because participating in such studies can tax the already-limited time and energy of individuals with ALS that communicate with eye-gaze alone. The 19 non-AAC participants, typing on a mobile device by hand, gave us helpful information about the ease of use of the system and allowed us to quantitatively validate gains in keystroke rates, supporting our results from two individuals with ALS who exclusively use eye-gaze typing to communicate.

The study itself has two phases, a scripted and an unscripted phase. In the scripted phase the participants play the role of one of the people in a two-person conversation, where the content that the participant needs to type shows up on screen as text. In the unscripted phase the participant engages in 5- or 6-turn short dialogues with the experimenter where just the conversation opener is predetermined, e.g., “What kind of music do you listen to?” and the rest is spontaneous. Prior to the study, participants watched a video demo and got a small practice session of five conversations to familiarize themselves with the interface.

To assess the SpeakFaster interface, we measured motor action savings (keystrokes saved compared to the full set of characters to be typed), practicality (typing speed in words per minute), and learnability of the SpeakFaster UI (how much practice it takes for people to get comfortable using the system).

Across all studies, SpeakFaster demonstrated substantial keystroke savings compared to traditional baselines for both eye-gaze users and non-AAC participants with both scripted and unscripted dialogs. For non-AAC users, SpeakFaster allows for 56% (p = 8.0x10-11) keystrokes savings in the scripted scenario and 45% (p = 5.5×10−7) savings rate in the unscripted scenario. SpeakFaster also enabled significant keystroke savings in the scripted phase for our ALS eye-gaze tester.

Speakfaster2-Results1Prev

Left: Keystrokes savings rate (KSR) for non-AAC users. Right: KSR for ALS eye-gaze users. The orange and purple bars show the KSR when using the SpeakFaster system, and blue and green bars are when using the baseline smart keyboard.

While offering substantial keystrokes savings, for non-AAC users the overall text entry speed remained comparable to conventional typing. However, in our lab study with a single ALS eye-gaze user, SpeakFaster led to a 61.3% increase (p = 0.011) in typing speed in the scripted phase, and a 46.4% increase (p = 0.43) in speed in the unscripted phase. While we can't generalize this to a larger population of users, this speaks to the promise of such systems to significantly improve communication for users of eye-gaze keyboards.

SpeakFaster3-Results2

Left: For non-AAC participants overall text-entry speeds across scripted and unscripted phases did not show significant changes. Right: For our ALS eye-gaze participant, SpeakFaster led to a significant speed up in both the scripted and unscripted phases.

Aside from motor action savings and typing speeds, adoption of a typing system and user interface also depends on the learning curve and cognitive overhead it introduces. While the initial learning curve for SpeakFaster was slightly slower for eye-gaze users compared to non-AAC users (for ALS participants there is an additional factor of getting used to the eye-gaze calibration and a different setup compared to the customizations they may be used to with their regular eye-gaze keyboard) it proved manageable with practice. Just fifteen practice dialogues were sufficient for the eye-gaze participants to reach a comfortable typing speed.

Speakfaster4-Results3

With as few as six practice dialogs for non-AAC users and 15 practice dialogs for our ALS eye-gaze user, participants are able to learn the SpeakFaster system to reach a comfortable typing speed of 20–30 words per minute (shown on the y-axis)

LLMs can unlock a brighter future for AAC communication

The SpeakFaster research study reveals the potential for significant improvements to eye-gaze typing with a UI that incorporates LLMs. By dramatically increasing text entry speed and reducing physical strain, systems like SpeakFaster can empower individuals with severe motor impairments to communicate more effectively and efficiently, enabling them to participate more fully in conversations, which can lead to increased independence, social participation, self-expression, and improved quality of life.

With this research we hope to spur the community into exploring further advancements in LLM technology, UI design, and personalization to develop and enhance the capabilities of systems like SpeakFaster and make this technology accessible to more people. As language models continue to improve, we are excited to see them drive progress in AAC communication, with the goal of enabling faster communication for those who need it most.

Acknowledgements

We would like to thank the dedicated members of the Team Gleason Foundation and Project Euphonia who have made this research possible. Specifically, we would like to acknowledge the valuable contributions of Steve Gleason, Blair Casey, Jay Beavers, John Costello, Julie Cattiau, Katie Seaver, Richard Cave, Anton Kast, Pan-Pan Jiang, Rus Heywood, Michael Terry, James Stout, Mahima Pushkarna, Jon Campbell, William Ito, and Shumin Zhai. We are grateful to Tobii (R) for granting us permissions to use the Tobii Stream Engine for eye-gaze prototype development.

We also wish to express our appreciation to the Leonard Florence Center for Living for their unwavering commitment to improving independence and support for individuals with ALS. Their support has been instrumental in advancing this important work.