Cross-Level Typing the Logical Form for Open-Domain Semantic Parsing
Abstract
This thesis presents a novel approach to assigning types to expressive Discourse Representation Structure (DRS) meaning representations. In terms of linguistic analysis, our typing methodology couples together the representation of phenomena at the same level of analysis that was traditionally considered to belong to distinctive layers. In the thesis, we claim that the realisation of sub-lexical, lexical, sentence and discourse-level phenomena (such as tense, word sense, named entity class, thematic role, and rhetorical structure) on the surface can be represented as variations of values that belong to the same typed category within our cross-level typing technique.
We show the implications of our approach on the computational modelling of natural language understanding (NLU) using Combinatory Categorial Grammar, specifically in the context of one of the core NLU tasks, semantic parsing. We present that cross-level type-assigned logical forms deliver compact lexicon representations and help re-formalise search space constraining tasks, such as Supertagging, as part of the semantic analysis, whereas such approaches were only used in syntactic parsing. We empirically demonstrate the effectiveness of using a training objective that is based on masking the typed logical forms in pre-training models to obtain re-usable lexical representations. Our results indicate that improved model performance on parsing open-domain text to DRS is possible when the embedding layer of an encoder-decoder model such as Transformer is initialised with weights that are distilled from a model that is pre-trained using our objective.
We show the implications of our approach on the computational modelling of natural language understanding (NLU) using Combinatory Categorial Grammar, specifically in the context of one of the core NLU tasks, semantic parsing. We present that cross-level type-assigned logical forms deliver compact lexicon representations and help re-formalise search space constraining tasks, such as Supertagging, as part of the semantic analysis, whereas such approaches were only used in syntactic parsing. We empirically demonstrate the effectiveness of using a training objective that is based on masking the typed logical forms in pre-training models to obtain re-usable lexical representations. Our results indicate that improved model performance on parsing open-domain text to DRS is possible when the embedding layer of an encoder-decoder model such as Transformer is initialised with weights that are distilled from a model that is pre-trained using our objective.