Adnan Ozturel
Authored Publications
Sort By
Cross-Level Typing the Logical Form for Open-Domain Semantic Parsing
Ph.D. Thesis (2022)
Preview abstract
This thesis presents a novel approach to assigning types to expressive Discourse Representation Structure (DRS) meaning representations. In terms of linguistic analysis, our typing methodology couples together the representation of phenomena at the same level of analysis that was traditionally considered to belong to distinctive layers. In the thesis, we claim that the realisation of sub-lexical, lexical, sentence and discourse-level phenomena (such as tense, word sense, named entity class, thematic role, and rhetorical structure) on the surface can be represented as variations of values that belong to the same typed category within our cross-level typing technique.
We show the implications of our approach on the computational modelling of natural language understanding (NLU) using Combinatory Categorial Grammar, specifically in the context of one of the core NLU tasks, semantic parsing. We present that cross-level type-assigned logical forms deliver compact lexicon representations and help re-formalise search space constraining tasks, such as Supertagging, as part of the semantic analysis, whereas such approaches were only used in syntactic parsing. We empirically demonstrate the effectiveness of using a training objective that is based on masking the typed logical forms in pre-training models to obtain re-usable lexical representations. Our results indicate that improved model performance on parsing open-domain text to DRS is possible when the embedding layer of an encoder-decoder model such as Transformer is initialised with weights that are distilled from a model that is pre-trained using our objective.
View details
A Gold Standard Dependency Treebank for Turkish
Proceedings of The 12th Language Resources and Evaluation Conference, European Language Resources Association" (2020), pp. 5156-5163
Preview abstract
We introduce TWT; a new treebank for Turkish which consists of web and Wikipedia sentences that are annotated for segmentation, morphology, part-of-speech and dependency relations. To date, it is the largest publicly available human-annotated morpho-syntactic Turkish treebank in terms of the annotated word count. It is also the first large Turkish dependency treebank that has a dedicated
Wikipedia section. We present the tagsets and the methodology that are used in annotating the treebank and also the results of the baseline experiments on Turkish dependency parsing with this treebank.
View details
A Syntactically Expressive Morphological Analyzer for Turkish
Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing, Association for Computational Linguistics, Dresden, Germany (2019), pp. 65-75
Preview abstract
We present a broad coverage model of Turkish morphology and an open-source morphological analyzer that implements it. The model captures intricacies of Turkish morphology-syntax interface, thus could be used as a baseline that guides language model development. It introduces a novel fine part-of-speech tagset, a fine-grained affix inventory and represents morphotactics without zero-derivations. The morphological analyzer is freely available. It consists of modular reusable components of human-annotated gold standard lexicons, implements Turkish morphotactics as finite-state transducers using OpenFst and morphophonemic processes as Thrax grammars.
View details
Annotating Topic Development in Information Seeking Queries
Marta Andersson
Silvia Pareti
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA), Portorož, Slovenia
Preview abstract
This paper contributes to the limited body of empirical research into the domain of discourse structure of information seeking queries. In this paper we describe the development of an annotation schema for coding topic development in information seeking queries and the initial observations from a pilot sample of query sessions. The main idea explored is the relationship between constant and variable discourse entities and their role in tracking changes in the topic progression. We argue that the topicalized entities remain stable across discourse moves and can be identified by a simple mechanism where anaphora resolution is a precursor. We also claim that a corpus annotated in this framework can be used as training data for dialogue management and computational semantics systems.
View details