Adrien Baranes
Adrien is a User Experience (UX) and Research Engineer at Google Research.
After Masters studies in Computer Science in Paris and a PhD in Artificial Intelligence focused on Cognitive Science, Intrinsic Motivations & Robotics, Adrien moved to New York City in 2012 to conduct Academic Research in Behavioral Psychology & Neuroscience at Columbia University.
In 2015, he joined Google New York City (London in 2019) to drive strategic Equity-centered Design Research, experiential Prototyping & data-driven processes across Google’s Corporate Engineering and Physical Architecture teams. His projects span from quantifying and improving employee motivation & productivity, to ensuring that Google's physical and digital, AI-assisted collaborative spaces are equitable. With a scientific and design thinking mindset, he led a R+D lab to prototype and validate product ideas through clear protocols, stakeholders and community input.
At Google Research, Adrien is focusing on Complex System Dynamics Simulations, User-Centered Research, Large Language Models and Data Privacy.
Authored Publications
Sort By
LLMs achieve adult human performance on higher-order theory of mind tasks
Oliver Siy
Geoff Keeling
Benjamin Barnett
Michael McKibben
Tatenda Kanyere
Robin I.M. Dunbar
Frontiers in Human Neuroscience (2025)
Preview abstract
This paper examines the extent to which large language models (LLMs) are able to perform tasks which require higher-order theory of mind (ToM)–the human ability to reason about multiple mental and emotional states in a recursive manner (e.g. I think that you believe that she knows). This paper builds on prior work by introducing a handwritten test suite – Multi-Order Theory of Mind Q&A – and using it to compare the performance of five LLMs of varying sizes and training paradigms to a newly gathered adult human benchmark. We find that GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on our ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences. Our results suggest that there is an interplay between model size and finetuning for higher-order ToM performance, and that the linguistic abilities of large models may support more complex ToM inferences. Given the important role that higher-order ToM plays in group social interaction and relationships, these findings have significant implications for the development of a broad range of social, educational and assistive LLM applications.
View details