GUIDE: A Benchmark for User Context Understanding and Assistance in GUI Workflow Videos

Saelyne Yang

Jaesang Yu

Yi-Hao Peng

Kevin Qinghong Lin

Jae Won Cho

Yale Song

Juho Kim

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2026)

Download Google Scholar

Listen with Illuminate

Abstract

Graphical User Interface (GUI) agents have the potential to assist users in interacting with complex software. While prior research has primarily focused on automating user actions through clicks and keystrokes, this paradigm overlooks human intention, where users value the ability to explore, iterate, and refine their ideas while maintaining agency.To move beyond automation and toward collaboration, GUI agents must understand what users are doing and why. We introduce GUIDE (GUI Understanding, Intent, and Help Decision Evaluation), a benchmark that evaluates AI models on their ability to perceive user behavior, infer intent, and provide assistance in open-ended GUI tasks. GUIDE consists of 67.5 hours of screen recordings from 120 novice user demonstrations with think-aloud narrations that surface user intent, across 10 complex software (e.g., PowerPoint, Photoshop). GUIDE defines three tasks—(i) Behavior State Detection, (ii) Intent Prediction, and (iii) Help Prediction that test a model’s ability to recognize behavior state, reason about goals, and decide when and how to help. Evaluations across eight state-of-the-art multimodal models reveal that all models struggled with the tasks, achieving only 44.6% and 55.0% accuracy on behavior state and help prediction. However, providing user context such as behavioral state and intent significantly improved the performance, raising help prediction by up to 50.2%. These results highlight the critical role of structured user understanding in effective assistance.Our benchmark provides a path toward GUI agents that go beyond automation to become truly user-aware collaborators.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

GUIDE: A Benchmark for User Context Understanding and Assistance in GUI Workflow Videos

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs