Google Research

Which sentence embeddings and which layers encode syntactic structure?



Recent models of language have eliminated syntactic-semantic dividing lines. We explore the psycholinguistic implications of this development by comparing different types of sentence embeddings in their ability to encode syntactic constructions. Our study uses contrasting sentence structures known to cause syntactic priming effects, that is, the tendency in humans to repeat sentence structures after recent exposure. We compare how syntactic alternatives are captured by sentence embeddings produced by a neural language model (BERT) or by the composition of word embeddings (BEAGLE, HHM, GloVe). Dative double object vs. prepositional object and active vs. passive sentences are separable in the high-dimensional space of the sentence embeddings and can be classified with a high degree of accuracy. The results lend empirical support to the modern, computational, integrated accounts of semantics and syntax, and they shed light on the information stored at different layers in deep language models such as BERT.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work