How AI agents can redefine universal design to increase accessibility
February 5, 2026
Marian Croak, VP Engineering, and Sam Sepah, Lead AI Accessibility PgM, Google Research
Google Research's Natively Adaptive Interfaces (NAI) redefine universal design by embedding multimodal AI tools that adapt to the user's unique needs, co-developed with the accessibility community.
At Google, we believe in building for everyone and accessibility (A11y) is a key part of that. Our teams work with communities to build products with and for people with disabilities, incorporating accessibility from the beginning of the development process. Today, generative AI provides us with the opportunity to make our tools even more personal and adaptive.
People with disabilities make up 16% of the world’s population. With the adaptive capabilities of generative AI, we have an opportunity to better serve 1.3 billion people globally by adopting a "Nothing About Us Without Us" approach to our tech development. We believe technology should be as unique as the person using it. We’re creating a world where every interface shapes itself to your preferences, working in harmony with you, exactly as you are.
In this blog, we are proud to introduce Natively Adaptive Interfaces (NAI), a framework for creating more accessible applications through multimodal AI tools. With NAI, UI design can move beyond one-size-fits-all towards context-informed decisions. NAI replaces static navigation with dynamic, agent-driven modules, transforming digital architecture from a passive tool into an active collaborator.
Following rigorous prototyping to validate this framework, we have an emerging path toward universal design. Our goal is to create environments that are more inherently accessible to people with disabilities.
Community investments: Nothing About Us, Without Us
Building on the long-standing advocacy principle of "Nothing About Us, Without Us," we continue to integrate community-led co-design into our own development lifecycles.
By working with individuals from disability communities and engaging them as co-designers from the start, we can ensure their lived experiences and expertise are at the heart of the solutions being built. With support from Google.org, organizations like the Rochester Institute of Technology’s National Technical Institute for the Deaf (RIT/NTID), The Arc of the United States, RNID, and Team Gleason are building adaptive AI tools that solve real-world friction points for their communities. These organizations recognize the transformative potential for impact of AI tools that are natively fluent in the diverse ways humanity communicates.
Furthermore, this co-design approach drives economic empowerment and fosters employment opportunities within the disability community, ensuring that the people informing the technology are also rewarded for its success.
Our research direction: Designing for accessibility
In our early research, we found that a significant barrier to digital equity is the "accessibility gap", i.e., the delay between the release of a new feature and the creation of an assistive layer for it. To close this gap, we are shifting from reactive tools to agentic systems that are native to the interface.
Research pillar: Using multi-system agents to improve accessibility
Multimodal AI tools provide one of the most promising paths to building accessible interfaces. In specific prototypes, such as our work with web readability, we’ve tested a model where a central Orchestrator acts as a strategic reading manager.
Instead of a user navigating a complex maze of menus, the Orchestrator maintains shared context — understanding the document and making it more accessible by delegating the tasks to expert sub-agents.
- The Summarization Agent: It masters complex documents by breaking down information and delegating key tasks to expert sub-agents, making even the deepest insights clear and accessible.
- The Settings agent: Handles UI adjustments, such as scaling text, dynamically.
By testing this modular approach,our research shows users can interact with systems more intuitively, ensuring that specialized tasks are always handled by the right expert without the user needing to hunt for the "correct" button.
Toward multimodal fluency
Our research also focuses on moving beyond basic text-to-speech toward multimodal fluency. By leveraging Gemini’s ability to process voice, vision, and text simultaneously, we’ve built prototypes that can turn live video into immediate, interactive audio descriptions.
This isn't just about describing a scene; it’s about situational awareness. In our co-design sessions, we’ve observed how allowing users to interactively query their environment — asking for specific visual details as they happen — can reduce cognitive load and transform a passive experience into an active, conversational exploration.
Proven prototypes: The "vertex" of human interaction
We validated this architecture through rigorous prototyping, aiming to solve complex interaction challenges with opportunities for improvement. In these "vertex" moments, our research showed that multimodal AI tools could accurately interpret and respond to the nuanced, specific needs of users.
- StreetReaderAI: A virtual guide for blind and low-vision (BLV) users, navigating physical spaces can be a significant barrier to social participation. StreetReaderAI addresses this by employing two interactive AI subsystems: an AI Describer that constantly analyzes visual and geographic data, and an AI Chat that answers specific questions. Because the system maintains context, a user can walk past a landmark and later ask, "Wait, where was that bus stop?" The agent recalls the previous visual frame and provides precise guidance: "The bus stop is behind you, approximately 12 meters away."
- Multimodal Agent Video Player (MAVP): Passive listening standard Audio Descriptions (AD) provide a narrated track of visual elements, but they are often static. The MAVP prototype transforms video into an interactive, user-led dialogue. Built with Gemini models, MAVP allows users to verbally adjust descriptive detail in real-time or pause to ask questions like, "What is the character wearing?" The system uses a two-stage pipeline: it first generates a "dense index" of visual descriptions offline, then uses retrieval-augmented generation (RAG) to provide fast, high-accuracy responses during playback.
- Grammar Laboratory: RIT/NTID, with support from Google.org, is building Grammar Laboratory, a bilingual (American Sign Language and English) AI-powered learning platform that provides tutoring and feedback on students’ English writing. It offers grammar instruction through multiple accessible formats, including: video explanations of English grammar rules delivered in ASL, captions in written English, spoken English narration, and written transcripts. Students interface with an adaptive AI tool that creates bespoke content and customizes their learning experience based on their interaction, ensuring that users can engage with the content in the format that best suits their language preferences and strengths. To highlight this impact, Grammar Laboratory was recently highlighted in a film produced for us by BBC StoryWorks Commercial Productions.
The curb-cut effect
Applications utilizing the NAI framework often experience a strong "curb-cut effect" — the phenomenon wherein features designed for extreme constraints benefit a much broader group. Just as sidewalk ramps were originally designed for wheelchair users but improved life for parents with strollers and travelers with luggage, AI tools built with the NAI framework create superior experiences for many. For example:
- Universal utility: Voice interfaces built for blind users can be incredibly useful for sighted users who are multitasking.
- Synthesis tools: Tools designed to support those with learning disabilities can help busy professionals parse information more quickly.
- Personalized learning: AI-powered tutors built for deaf and hard of hearing users can create custom learning journeys for all students.
Conclusion: The golden age of access
We are entering a "golden age" of what is possible with AI for accessibility. With the adaptive power of multimodal AI, we have the opportunity to build user interfaces that adjust in real-time to the vast variety of human ability.
This era is about more than just using a device; it is about working directly with the communities who use these technologies. By building technology with and for the disability community, we can ignite a cycle of helpfulness that expands the horizon of what is possible by creating it.
Acknowledgements
Our work is made possible through the generous support of Google.org, whose commitment to our vision has been transformative. We are honored to work alongside dedicated teams from Google Research AI, Product For All (P4A), BBCWorks, Rochester Institute of Technology’s National Technical Institute for the Deaf (RIT/NTID), The Arc of the United States, RNID, and Team Gleason.