By nature of the cost and time required to train Large Language Models (LLMs), the embedded knowledge within is usually frozen at the moment their training data is collected. As a result, LLMs have been shown to suffer from diachronic degradation. The in-context learning paradigm can provide a workaround for this limitation by supplying relevant information at inference time. We introduce a new benchmark to evaluate LLMs for one particular but critical aspect of diachronic change: language acquisition. To that end, we rewrite Winograd-style co-reference resolution problems by replacing a word for a new synthetic but plausible English word. The meaning of the word is given to the model in the prompt via a dictionary definition. We show that the accuracy of LLMs compared to the original Winograd tasks decreases radically in our benchmark and we believe this serves as a measure of progress for future models.
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work