What do you learn from context? Probing for sentence structure in contextualized word representations

Patrick Xia
Berlin Chen
Alex Wang
Adam Poliak
R. Thomas McCoy
Najoung Kim
Benjamin Van Durme
Samuel R. Bowman
International Conference on Learning Representations(2019)


Contextualized representation models such as CoVe (McCann et al., 2017) and ELMo (Peters et al., 2018a) have recently achieved state-of-the-art results on a broad suite of downstream NLP tasks. Building on recent token-level probing work (Peters et al., 2018a; Blevins et al., 2018; Belinkov et al., 2017b; Shi et al., 2016), we introduce a broad suite of sub-sentence probing tasks derived from the traditional structured-prediction pipeline, including parsing, semantic role labeling, and coreference, and covering a range of syntactic, semantic, local, and long-range phenomena. We use these tasks to examine the word-level contextual representations and investigate how they encode information about the structure of the sentence in which they appear. We probe three recently-released contextual encoder models, and find that ELMo better encodes linguistic structure at the word level than do other comparable models. We find that the existing models trained on language modeling and translation produce strong representations for syntactic phenomena, but only offer small improvements on semantic tasks over a non-contextual baseline.