Aleksandr Chuklin
As of 2019 Aleksandr is working on the problem of enabling access to information via dialog and various ML/NLP challenges related to it. Aleksandr obtained his PhD from University of Amsterdam, where his research was focused on user modeling and quality evaluation for complex search interfaces. He has a number of publications in leading information retrieval conferences and journals, co-authored a book on click models for web search, and is running a yearly workshop on search-oriented conversational AI (scai.info). Prior to Google, Aleksandr worked on search-related problems in Applied Research group at Yandex.
Authored Publications
Sort By
Text Generation with Text-Editing Models
Daniil Mirylenka
Jakub Adamek
Yue Dong
Proceedings of NAACL 2022, ACL
Preview abstract
Text-editing models have recently become a prominent alternative to seq2seq models for monolingual natural language generation (NLG) tasks such as grammatical error correction, text simplification, and style transfer. These tasks exhibit a large amount of textual overlap between the source and target texts. Text-editing models take advantage of this trait and learn to generate the output by predicting edit operations applied to the source sequence in contrast to seq2seq models that generate the output from scratch. Text-editing models provide several benefits over seq2seq models including faster inference speed, higher sample efficiency, and better control and interpretability of the outputs. This tutorial provides a comprehensive overview of the text-edit based approaches and current state-of-the-art models, analyzing the pros and cons of different methods. We discuss challenges related to productionization and how these models can to help mitigate hallucination and bias, both pressing challenges in the field of text generation.
View details
CLSE: Corpus of Linguistically Significant Entities
Justin Xu Zhao
Mihir Sanjay Kale
Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2022) at EMNLP 2022 (to appear)
Preview abstract
One of the biggest challenges of natural language generation (NLG) is the proper handling of named entities. Named entities are a common source of grammar mistakes such as wrong prepositions, wrong article handling, or incorrect entity inflection. Without factoring linguistic representation, such errors are often underrepresented when evaluating on a small set of arbitrarily picked argument values, or when translating a dataset from a linguistically simpler language, like English, to a linguistically complex language, like Russian. However, for some applications, broadly precise grammatical correctness is critical—native speakers may find entity-related grammar errors silly, jarring, or even offensive.
To enable the creation of more linguistically diverse NLG datasets, we release a Corpus of Linguistically Significant Entities (CLSE) annotated by linguist experts. The corpus includes 34 languages and covers 74 different semantic types to support various applications from airline ticketing to video games. To demonstrate one possible use of CLSE, we produce an augmented version of the Schema-Guided Dialog Dataset, SGD-CLSE. Using the CLSE's entities and a small number of human translations, we create a linguistically representative NLG evaluation benchmark in three languages: French (high-resource), Marathi (low-resource), and Russian (highly inflected language). We establish quality baselines for neural, template-based, and hybrid NLG systems and discuss the strengths and weaknesses of each approach.
View details
Using Audio Transformations to Improve Comprehension in Voice Question Answering
Johanne R. Trippas
Hanna Silen
Damiano Spina
Crestani F. et al. (eds) Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2019, Springer, Cham, pp. 164-170
Preview abstract
Many popular form factors of digital assistants—such as Amazon Echo, Apple Homepod, or Google Home—enable the user to hold a conversation with these systems based only on the speech modality. The lack of a screen presents unique challenges. To satisfy the information need of a user, the presentation of the answer needs to be optimized for such voice-only interactions. In this paper, we propose a task of evaluating the usefulness of audio transformations (i.e., prosodic modifications) for voice-only question answering. We introduce a crowdsourcing setup where we evaluate the quality of our proposed modifications along multiple dimensions corresponding to the informativeness, naturalness, and ability of the user to identify key parts of the answer. We offer a set of prosodic modifications that highlight potentially important parts of the answer using various acoustic cues. Our experiments show that some of these modifications lead to better comprehension at the expense of only slightly degraded naturalness of the audio.
View details
Preview abstract
Modern search engine result pages often provide immediate value to users and organize information in such a way that it is easy to navigate. The core ranking contributes to this and so do result snippets, smart organization of result blocks and extensive use of one-box answers or side panels. While they are useful to the user and help search engines to stand out, such features present two big challenges for evaluation. First, the presence of such elements on a search engine result page (SERP) may lead to the absence of clicks, which is, however, not related to dissatisfaction, so-called “good abandonments.” Second, the non-linear layout and visual difference of SERP items may lead to a non-trivial sequence of the user’s attention, which is not captured by existing evaluation metrics.
In this paper we propose a model of user behavior on a SERP that jointly captures click behavior, user’s attention and satisfaction, the CAS model, and demonstrate that it gives more accurate predictions of user actions and self- reported satisfaction than existing models based on clicks. We use the CAS model to build a novel evaluation metric that can be applied to non-linear SERP layouts and that account for the utility that users obtain directly on a SERP. We demonstrate that this metric shows better agreement with user-reported satisfaction than conventional evaluation metrics.
View details