Jump to Content
Bernd Bohnet

Bernd Bohnet

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract A growing area of research investigates augmenting language models with tools (e.g., search engines, calculators) to overcome their shortcomings (e.g., missing or incorrect knowledge, incorrect logical inferences). Various few-shot tool-usage strategies have been proposed. However, there is no systematic and fair comparison across different strategies, or between these strategies and strong baselines that do not leverage tools. We conduct an extensive empirical analysis, finding that (1) across various datasets, example difficulty levels, and models, strong no-tool baselines are competitive to tool-assisted strategies, implying that effectively using tools with in-context demonstrations is a difficult unsolved problem; (2) for knowledge-retrieval tasks, strategies that *refine* incorrect outputs with tools outperform strategies that retrieve relevant information *ahead of* or *during generation*; (3) tool-assisted strategies are expensive in the number of tokens they require to work -- incurring additional costs by orders of magnitude -- which does not translate into significant improvement in performance. Overall, our findings suggest that few-shot tool integration is still an open challenge, emphasizing the need for comprehensive evaluations of future strategies to accurately assess their *benefits* and *costs*. View details
    Preview abstract Most recent coreference resolution systems use search algorithms over possible spans to identify mentions and resolve coreference. We instead present a coreference resolution system that uses a text-to-text (seq2seq) paradigm to predict mentions and links jointly, which simplifies the coreference resolution by eliminating both the search for mentions and coreferences. We implemented the coreference system as a transition system and use multilingual T5 as language model. We obtained state-of-the-art accuracy with 83.3 F1-score on the CoNLL-2012 data set. We use the SemEval-2010 data sets to evaluate on languages other than English and get substantially higher Zero-shot F1-scores for 3 out of 4 languages than previous approaches and significantly exceed previous supervised state-of-the-art results for all five tested languages. View details
    Decoding Part-of-Speech from human EEG signals
    Alex Murphy
    Ryan Thomas Mcdonald
    Uta Noppeney
    (2022)
    Preview abstract This work explores techniques to predict Part-ofSpeech (PoS) tags from neural signals measured at millisecond resolution with electroencephalography (EEG) during text reading. We show that information about word length, frequency and word class is encoded by the brain at different poststimulus latencies. We then demonstrate that pretraining on averaged EEG data and data augmentation techniques boost PoS single-trial EEG decoding accuracy for Transformers (but not linear SVMs). Applying optimised temporally-resolved decoding techniques we show that Transformers outperform linear SVMs on PoS tagging of unigram and bigram data more strongly when information requires integration across longer time windows. View details
    Preview abstract Large language models (LLMs) have shown impressive results across a variety of tasks while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM to attribute the text that it generates is likely to be crucial for both system developers and users in this setting. We propose and study Attributed QA as a key first step in the development of attributed LLMs. We develop a reproducable evaluation framework for the task, using human annotations as a gold standard and a correlated automatic metric that we show is suitable for development settings. We describe and benchmark a broad set of architectures for the task. Our contributions give some concrete answers to two key questions (How to measure attribution?, and How well do current state-of-the-art methods perform on attribution?), and give some hints as to how to address a third key question (How to build LLMs with attribution?). View details
    A Gold Standard Dependency Treebank for Turkish
    Proceedings of The 12th Language Resources and Evaluation Conference, European Language Resources Association" (2020), pp. 5156-5163
    Preview abstract We introduce TWT; a new treebank for Turkish which consists of web and Wikipedia sentences that are annotated for segmentation, morphology, part-of-speech and dependency relations. To date, it is the largest publicly available human-annotated morpho-syntactic Turkish treebank in terms of the annotated word count. It is also the first large Turkish dependency treebank that has a dedicated Wikipedia section. We present the tagsets and the methodology that are used in annotating the treebank and also the results of the baseline experiments on Turkish dependency parsing with this treebank. View details
    On Faithfulness and Factuality in Abstractive Summarization
    Ryan Thomas Mcdonald
    Proceedings of The 58th Annual Meeting of the Association for Computational Linguistics (ACL) (2020)
    Preview abstract It is well known that the standard likelihood training and approximate decoding objectives in neural text generation models are fundamentally flawed and lead to dull and repetitive responses. We found that these models when tested on abstractive summarization are highly prone to hallucinate content that is either unfaithful to the input document, completely irrelevant or gibberish. We conduct a large scale human evaluation of several state of the art neural abstractive summarization systems including pretrained language models to better understand the types of hallucinations. Furthermore, we study the extent to which the hallucinated content (i) co-occurs with the common linguistic irregularities such as repetition and incoherence, and (ii) can be measured by NLU measures such as textual entailment, question answering and OpenIE-based fact checking. View details
    Preview abstract Named Entity Recognition (NER) is a fundamental task in Natural Language Processing, concerned with identifying spans of text expressing references to entities. NER research is often focused on flat entities only (flat NER), ignoring the fact that entity references can be nested, as in [Bank of [China]] (Finkel and Manning, 2009). In this paper, we use ideas from graph-based dependency parsing to provide our model a global view on the input via a biaffine model (Dozat and Manning, 2017). The biaffine model scores pairs of start and end tokens in a sentence which we use to explore all spans, so that the model is able to predict named entities accurately. We show that the model works well for both nested and flat NER through evaluation on 8 corpora and achieving SoTA performance on all of them, with accuracy gains of up to 2.2 percentage points. View details
    Recursive LSTM Tree Representation for Arc-Standard Transition-BasedDependency Parsing
    Mohab El-karef
    Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019) (2019)
    Preview abstract We propose a method to represent dependency trees as dense vectors through the re-cursive application of Long Short-Term Memory networks to build Recursive LSTM Trees (RLTs). We show that the dense vectors produced by Recursive LSTM Trees replace the need for structural features by using them as feature vectors for a greedy Arc-Standard transition-based dependency parser. We also show that RLTs have the ability to incorporate useful information from the bi-LSTM positional representation used by \newcite{crossH16} and \newcite{kiperwasser2016simple}. The resulting dense vectors are able to express both structural information relating to the dependency tree, as well as sequential information relating to the position in the sentence. The resulting parser only requires the vector representations of the top two items on the parser stack, which is, to the best of our knowledge, the smallest feature set ever published for Arc-Standard parsers to date, while still managing to achieve competitive results. View details
    Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings
    Ryan Mcdonald
    Emily Pitler
    Association for Computational Linguistics (ACL), Melbourne, Australia (2018)
    Preview abstract The rise of neural networks, and particularly recurrent neural networks, has produced significant advances in part-of-speech tagging accuracy. One characteristic common among these models is the presence of rich initial word encodings. These encodings typically are composed of a recurrent character-based representation with learned and pre-trained word embeddings. However, these encodings do not consider a context wider than a single word and it is only through subsequent recurrent layers that word or sub-word information interacts. In this paper, we investigate models that use recurrent neural networks with sentence-level context for initial character and word-based representations. In particular we show that optimal results are obtained by integrating these context sensitive representations through synchronized training with a meta-model that learns to combine their states. We present results on part-of-speech and morphological tagging with state-of-the-art performance on a number of languages. View details
    The First Multilingual Surface Realisation Shared Task (SR'18): Overview and Evaluation Results
    Anya Belz
    Emily Pitler
    Leo Wanner
    Simone Mille
    Association for Computational Linguistics 2018, Melbourne (2018)
    Preview abstract We report results from the SR’18 Shared Task, a new multilingual surface realisation task organised as part of the ACL’18 Workshop on Multilingual Surface Realisation. As in its English-only predecessor task SR’11, the shared task comprised two tracks with different levels of complexity: (a) a shallow track where the inputs were full UD structures with word order information removed and tokens lemmatised; and (b) a deep track where additionally, functional words and morphological information were removed. The shallow track was offered in ten, and the deep track in three languages. Systems were evaluated (a) automatically, using a range of intrinsic metrics, and (b) by human judges in terms of readability and meaning similarity. This report presents the evaluation results, along with descriptions of the SR’18 tracks, data and evaluation methods. For full descriptions of the participating systems, please see the separate system reports elsewhere in this volume. View details
    Underspecified Universal Dependency Structures as Inputs for Multilingual Surface Realisation
    Anya Belz
    Leo Wanner
    Simone Mille
    International Conference on Natural Language Generation (2018) (to appear)
    Preview abstract This paper presents the datasets used in the First Multilingual Surface Realisation Shared Task (SR’18), describes in detail how they were created, and evaluates their quality. In addition, we examine (a) the NLG subtask of surface realisation itself, (b) the motivation for, and likely useful- ness of, deriving NLG inputs from annotations in resources originally developed for Natural Language Understanding (NLU), (c) whether the resulting inputs supply enough information of the right kind for the final stage in the NLG process, and more tentatively, (d) what role surface realisation is likely to play in the future in the NLG context. View details
    82 Treebanks, 34 Models: Universal Dependency Parsing with Cross-Treebank Models
    Aaron Smith
    Joakim Nivre
    Miryam de Lhoneux
    Sara Stymne
    Yan Shao
    Conference on Computational Natural Language Learning (2018)
    Preview abstract We present the Uppsala system for the CoNLL 2018 Shared Task on Multilingual Parsing from Raw Text to Universal Dependencies. Our system is a pipeline consisting of three components: the first performs joint word and sentence segmentation; the second predicts part-of speech tags and morphological features; the third predicts dependency trees from words and tags. Instead of training a single parsing model for each treebank, we trained models with multiple treebanks for the same language or closely related languages, greatly reducing the number of models. On the official test run, we achieved a macro-averaged LAS F1 of 72.37 and a macro-averaged MLAS F1 of 59.20, ranking 7th of 27 teams for both of these metrics. View details
    Shared Task Proposal: Multilingual Surface Realization Using Universal Dependency Trees
    Anya Belz
    Emily Pitler
    Leo Wanner
    Simone Mille
    Proceedings of the 10th International Conference on Natural Language Generation, Association for Computational Linguistics (ACL), Santiago de Compostela, Spain (2017), pp. 120-123 (to appear)
    Preview abstract We propose a shared task on multilingual Surface Realization, i.e., on mapping unordered and uninflected universal dependency trees to correctly ordered and inflected sentences in a number of languages. A second deeper input will be available in which, in addition, functional words, fine-grained PoS and morphological information will be removed from the input trees. The first shared task on Surface Realization was carried out in 2011 with a similar setup, with a focus on English. We think that it is time for relaunching such a shared task effort in view of the arrival of Universal Dependencies annotated treebanks for a large number of languages on the one hand, and the increasing dominance of Deep Learning, which proved to be a game changer for NLP, on the other hand. View details
    Shared Task Proposal on Multilingual Surface Realization using Universal Dependency Trees
    Anya Belz
    Leo Wanner
    Simone Mille
    International Conference on Natural Language Generation, Santiago de Compostela, Spain (2017), pp. 4
    Preview abstract We propose a Shared Task on multilingual Surface Realization, i.e. the mapping from unordered and uninflected universal dependency trees to correctly ordered and inflected sentences in a number of languages. A second deeper input will be available, in which, in addition, functional words, fine-grained PoS and morphological information will be removed. The first Shared Task on Surface Realization was carried out in 2011 with a similar setup, with a focus on English. We think that it is time for relaunching such a Shared Task effort in view of the arrival of Universal Dependencies annotated treebanks for a large number of languages on the one hand, and the increasing dominance of Deep Learning, which proved to be a game changer for NLP, on the other hand. View details
    Generalized Transition-based Dependency Parsing
    Emily Pitler
    Ryan Mcdonald
    Association for Computational Linguistics (ACL) (2016) (to appear)
    Preview abstract In this paper, we present a transition-base parsing framework where a specific parser type is instantiated in terms of a set of abstract control parameters that constrain transitions between parser states. These parameters enable a generalization across a range of transition-based parsing algorithms, including Arc-eager, Arc-standard, and Easy-first. This generalization provides a unified framework that allows us to describe and compare various transition-based parsing approaches from a theoretical and empirical perspective. This includes both previously studied transition systems, but potentially new systems as well. View details
    No Results Found