Text Genre and Training Data Size in Human-Like Parsing
Abstract
Domain-specific training typically makes NLP
systems work better. We show that this extends to cognitive modeling as well by relating the states of a neural phrase-structure
parser to electrophysiological measures from
human participants. These measures were
recorded as participants listened to a spoken recitation of the same literary text that
was supplied as input to the neural parser.
Given more training data, the system derives
a better cognitive model — but only when the
training examples come from the same textual genre. This finding is consistent with the
idea that humans adapt syntactic expectations
to particular genres during language comprehension (Kaan and Chun, 2018; Branigan and
Pickering, 2017).