Gaze Target Estimation Anywhere with Concepts

Xu Cao; Houze Yang; Vipin Gunda; Zhongyi Zhou; Tianyu Xu; Adarsh Kowdle; Inki Kim; Jim Rehg

Gaze Target Estimation Anywhere with Concepts

Xu Cao

Houze Yang

Vipin Gunda

Zhongyi Zhou

Tianyu Xu

Adarsh Kowdle

Inki Kim

Jim Rehg

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2026)

Download Google Scholar

Abstract

Estimating human gaze targets in-the-wild is a formidable challenge. Existing computer vision algorithms rely on brittle, multi-stage pipelines that require explicit inputs like head bounding boxes and human pose, causing initial detection errors to cascade and lead to system failure. To overcome this, we introduce the \textbf{Promptable Gaze Target Estimation (PGE)} task, a new end-to-end, concept-driven paradigm. PGE conditions gaze prediction on flexible user text or visual prompts (e.g., "the boy in the red shirt" or "person in point [0.52, 0.48]") to identify a specific subject's target, which eliminates the rigid dependency on intermediate localization cues. We develop a scalable data engine to generate \textbf{Gaze-Co}, a dataset and benchmark of 120K high-quality, prompt-annotated image pairs. We also propose \textbf{AnyGaze}, the first model designed for PGE. AnyGaze uses a Transformer-based detector to fuse features from frozen encoders and simultaneously solves subject localization, in/out-of-frame presence, and gaze target heatmap estimation. AnyGaze achieves state-of-the-art performance on standard gaze target estimation benchmarks, setting a strong baseline for this new problem even on a difficult out-of-domain, real-world clinical dataset. We will open-source the AnyGaze model and the Gaze-Co benchmark.

Research Areas

Machine perception

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Gaze Target Estimation Anywhere with Concepts

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs