Alison Lentz
Authored Publications
Sort By
LLMs achieve adult human performance on higher-order theory of mind tasks
Oliver Siy
Geoff Keeling
Benjamin Barnett
Michael McKibben
Tatenda Kanyere
Robin I.M. Dunbar
Frontiers in Human Neuroscience (2025)
Preview abstract
This paper examines the extent to which large language models (LLMs) are able to perform tasks which require higher-order theory of mind (ToM)–the human ability to reason about multiple mental and emotional states in a recursive manner (e.g. I think that you believe that she knows). This paper builds on prior work by introducing a handwritten test suite – Multi-Order Theory of Mind Q&A – and using it to compare the performance of five LLMs of varying sizes and training paradigms to a newly gathered adult human benchmark. We find that GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on our ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences. Our results suggest that there is an interplay between model size and finetuning for higher-order ToM performance, and that the linguistic abilities of large models may support more complex ToM inferences. Given the important role that higher-order ToM plays in group social interaction and relationships, these findings have significant implications for the development of a broad range of social, educational and assistive LLM applications.
View details
Preview abstract
Two decades ago, the advent of competency-based medical education (CBME) marked a paradigm shift in assessment. Now, medical education is on the cusp of another transformation driven by advances in the field of artificial intelligence (AI). In this article, the authors explore the potential value of AI in advancing CBME and entrustable professional activities by shifting the focus of education from assessment of learning to assessment for learning. The thoughtful integration of AI technologies in observation is proposed to aid in restructuring our current system around the goal of assessment for learning by creating continuous, tight feedback loops that were not before possible. The authors argued that this personalized and less judgmental relationship between learner and machine could shift today’s dominating mindset on grades and performance to one of growth and mastery learning that leads to expertise. However, because AI is neither objective nor value free, the authors stress the need for continuous co-production and evaluation of the technology with geographically and culturally diverse stakeholders to define desired behavior of the machine and assess its performance.
View details
Adapting User Experience Research Methods for AI-Driven Experiences
Oliver Siy
Kira Awadalla
Gregorio Convertino
Elizabeth Churchill
ACM CHI 2020 (2020)
Preview abstract
This short paper describes how to adapt user experience research methods for artificial intelligence (AI)-driven applications. Presently, there is a dearth of guidance for conducting UX research on AI-driven experiences. We describe what makes this class of experiences unique, propose a preliminary foundational framework to categorize AI-driven experiences, and within the framework we show an example of methodological adaptations via a case study.
View details