A Benchmark for Reasoning with Spatial Prepositions

EMNLP 2023 (2023)

Abstract

Spatial reasoning is a fundamental building block of human cognition, used in representing, grounding, and reasoning about physical and abstract concepts. We propose a novel benchmark focused on assessing inferential properties of statements with spatial prepositions. The benchmark includes original datasets in English and Romanian. Our aim is to probe the limits of foundational reasoning in large language models. We use prompt engineering to study the performance of two families of large language models, PaLM and GPT3, on our benchmark. Our results show considerable variability in the performance of smaller and larger models, as well as across prompts and languages. We also examine the performance of the largest model, PaLM-540b, in a generative setting and find that it can approach human level performance with few-shot prompting.