XR Blocks: Accelerating AI + XR innovation

The combination of artificial intelligence (AI) and extended reality (XR) has the potential to unlock a new paradigm of immersive intelligent computing. However, a significant gap exists between the ecosystems of these two fields today. AI research and development is accelerated by mature frameworks like JAX, PyTorch, TensorFlow, and benchmarks like ImageNet and LMArena. Meanwhile, prototyping novel AI-driven XR interactions remains a high-friction process, often requiring practitioners to manually integrate disparate, low-level systems for perception, rendering, and interaction.

To bridge this gap, we introduce XR Blocks (presented at ACM UIST 2025), a cross-platform framework designed to accelerate human-centered AI + XR innovation. This is a significant step from our prior research in Visual Blocks for ML, which targets non-XR use cases and streamlines prototyping machine learning pipelines with visual programming. XR Blocks provides a modular architecture with plug-and-play components for core abstraction in AI + XR: user, world, interface, AI, and agents. Crucially, it is designed with the mission of accelerating rapid prototyping of perceptive AI + XR apps. Built upon accessible technologies (WebXR, threejs, LiteRT, Gemini), our toolkit lowers the barrier to entry for XR creators. We demonstrate its utility through a set of open-source templates, live demos, and source code on GitHub, with the goal of empowering the community to quickly move from concept to interactive prototype. You can find an overview of these capabilities in our directional paper and teaser video.

Watch the film

Link to Youtube Video

Introductory video of XR Blocks.

Design principles

Our architectural and API design choices are guided by three principles:

Embrace simplicity and readability: Inspired by Python's Zen, we prioritize clean, human-readable abstractions. A developer's script should read like a high-level description of the desired experience. Simple tasks should be simple to implement, and complex logic should remain explicit and understandable.
Prioritize the creator experience: Our primary goal is to make authoring intelligent and perceptive XR applications as seamless as possible. We believe that creators should focus on the user experience, not on the low-level “plumbing” of sensor fusion, AI model integration, or cross-platform interaction logic.
Pragmatism over completeness: We follow a design philosophy of pragmatism, since the fields of AI and XR are evolving quickly. A comprehensive, complex framework that attempts to be perfect will be obsolete upon release. We favor a simple, modular, and adaptable architecture that runs on both desktop and Android XR devices for a wide range of applications.

XR Blocks framework

Drawing inspiration from Visual Blocks for ML and InstructPipe, we designed the XR Blocks framework to provide a high-level, human-centered abstraction layer that separates the what of an interaction (denoted as Script, described more below) from the how of its low-level implementation.

XR Blocks accelerates the prototyping of real-time AI + XR applications across desktop simulators and Android XR devices. Examples: (a) XR Realism: Prototype depth-aware, physics-based interactions in simulation and deploy the same code to real-world XR devices. (b) XR Interactions: Seamlessly integrate custom gesture models to desktop simulator and on-device XR deployment. (c) AI + XR Integration: Build intelligent, context-aware assistants, like the Sensible Agent prototype that provides proactive suggestions with unobtrusive interactions.

Abstractions

We propose a new Reality Model composed of high-level abstractions to guide the implementation of the XR Blocks framework. Unlike the World Model designed for end-to-end unsupervised training, our Reality Model consists of replaceable modules for XR interaction. At the heart of our design is Script, the narrative and logical center of an application. Script operates on six first-class primitives (described and visualized below):

User & the physical world: Our model is centered around the User, consisting of hands, gaze, and avatar. The physical world allows Script to query the perceived reality such as depth (demo), estimated lighting condition (demo), and objects (demo).
Virtual interfaces & context: The model augments the blended reality with virtual UI elements, from 2D panels (demo) to fully 3D assets (demo). The perception pipeline analyzes the context of environment, activities, and histories of interaction. An example application can be found in Sensible Agent (discussed more below).
Intelligent & Social Entities: We treat AI-driven agents and remote human peers as primary entities within the model. This enables dynamic group conversations in hybrid human-AI conversations in DialogLab.

The conceptual Reality Model of the XR Blocks framework. At the center, Script contains the application’s logic and operates on a unified model of first-class primitives including the user, the physical world, AI agents, and the application context.

Implementation

This Reality Model is realized by XR Blocks’s modular Core engine, which provides high-level APIs that enable developers to harness the following subsystems without needing to master the implementation:

Perception & input pipeline: The camera, depth, and sound modules continuously feed and update the Reality Model’s representation of physical reality. The input module normalizes user actions from various devices, providing the raw data for XR Blocks to interpret.
AI as a core utility: The ai module acts as a central nervous system, providing simple yet powerful functions (.query, .runModel) that make large models an accessible utility.
Experience & visualization toolkit: To enable rapid creation, the toolkit provides a library of common affordances. The ux module offers reusable interaction behaviors like .selectable and .draggable (demo), while the ui and effect modules handle the rendering of interfaces and complex visual effects like occlusion (demo).

The modular architecture of the XR Blocks’s core engine, which consists of essential subsystems to realize the framework’s high-level abstractions, spanning perception (depth, input), AI integration (ai, agent), and user experience (ui, ux).

By separating the abstract Reality Model from the concrete Core engine, XR Blocks enables a powerful new creative workflow. The goal is to allow creators to move from high level, human-centric ideas to interactive prototypes much more quickly. We envision a future where any declarative prompt, “When the user pinches at an object, an agent should generate a poem of it”, could be directly translated to high-level instructions in XR Blocks:

Hence, the creator’s prompt is no longer pseudocode but a direct summary of the implementation logic. We envision this framework to more seamlessly translate such user intent into a system-level execution flow, composing capabilities from the input, sound, ai, world, ui, and agent modules to generate an emergent, intelligent behavior with user interaction.

The Interaction Grammar of XR Blocks, which abstracts user input by distinguishing between two types of interaction. Explicit events are direct, low-level inputs (e.g., a touch or click), while implicit intents are higher-level interpretations (e.g., a gesture or voice command), allowing creators to build interaction against user intent.

Application scenarios

We provide a suite of interactive applications to demonstrate the expressive power and flexibility of the XR Blocks framework. These examples showcase how our framework enables the rapid prototyping of sophisticated experiences that were previously too complex and costly to build, facilitating the creation of realistic, interactive, and intelligent mixed-reality worlds:

Applications of XR Blocks. (1) XR Realism: Depth-aware and physics-based ball pit (demo) and splash games (demo); geometry-aware shadows (demo), 3D Gaussian splatting with occlusion, and lighting estimation. (2) XR Interaction: Immersive emoji (demo) and rock paper scissors game (demo) empowered by custom ML models, dynamic swipe recognition, touch and grab with the physical world. (3) AI + XR: Integration with conversational AI (demo), XR objects (demo), glasses simulation in XR, and poem generation with a real-world camera.

The true power of the framework is realized when this Reality Model is deeply integrated with generative AI to create dynamic, personalized environments. We demonstrate this by building systems like Augmented Object Intelligence (XR-Objects), which imbues everyday physical objects with interactive digital affordances, such as dynamic virtual buttons. XR Blocks also serves as the foundation for Sensible Agent (published on ACM UIST 2025), a system for proactive and unobtrusive AR assistance. Our architecture provides the agent's core perception and interaction logic, providing an example of our primary goal: by providing robust, high-level tools, XR Blocks empowers Human-Computer Interaction researchers to bypass low-level implementation and focus directly on higher-order challenges like the cognitive principles of human-agent collaboration.

Demonstrations of XR Blocks SDK. (1) Using XR Blocks with conversational AI to automatically generate and test user prompts. (2) Running physical collision with depth sensing on Android XR. (3) Running LiteRT on a device with a custom gesture model to trigger XR animation.

Conclusion and future directions

Creating intelligent XR experiences is currently too fragmented, placing a major barrier between a creator's vision and its realization. We presented XR Blocks, an architecture and toolkit that dissolves this complexity by providing a high-level abstraction layer that separates what (the intent) from the how (the low-level implementation), dramatically accelerating the prototyping of context-aware applications. This is a foundational step toward a future where the boundaries between programming, design, and conversation disappear, enabling us to script realities as fluidly as we script stories. XR Blocks is far from perfect, and this work serves as an initial visionary document to invite more creators to join our journey, based on our belief that with the right set of tools, everyone can unleash their inner creativity with AI.

Acknowledgements

This work is a joint collaboration across multiple teams at Google. The following researchers and engineers contributed to this work: David Li and Ruofei Du (equal primary contributions), Nels Numan, Xun Qian, Yanhe Chen, and Zhongyi Zhou, (equal secondary contributions, sorted alphabetically), as well as Evgenii Alekseev, Geonsun Lee, Alex Cooper, Min Xia, Scott Chung, Jeremy Nelson, Xiuxiu Yuan, Jolica Dias, Tim Bettridge, Benjamin Hersh, Michelle Huynh, Konrad Piascik, Ricardo Cabello, and David Kim. We would like to thank Mahdi Tayarani, Max Dzitsiuk, Patrick Hackett, Seeyam Qiu, Brian Collins, Steve Toh, Eric Gonzalez, Nicolás Peña Moreno, Yi-Fei Li, Ziyi Liu, Jing Jin for their feedback and discussion on our early-stage proposal and WebXR experiments. We thank Max Spear, Adarsh Kowdle, and Guru Somadder for the directional contribution and thoughtful reviews.

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

XR Blocks: Accelerating AI + XR innovation

Quick links

Watch the film

Design principles

XR Blocks framework

Abstractions

Implementation

Application scenarios

Conclusion and future directions

Acknowledgements

Quick links

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

XR Blocks: Accelerating AI + XR innovation

Quick links

Watch the film

Design principles

XR Blocks framework

Abstractions

Implementation

Application scenarios

Conclusion and future directions

Acknowledgements

Quick links

Other posts of interest