Jump to Content
Massimiliano Ciaramita

Massimiliano Ciaramita

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Preview abstract This paper presents first successful steps in designing search agents that learn meta-strategies for iterative query refinement in information-seeking tasks. Our approach uses machine reading to guide the selection of refinement terms from aggregated search results. Agents are then empowered with simple but effective search operators to exert fine-grained and transparent control over queries and search results. We develop a novel way of generating synthetic search sessions, which leverages the power of transformer-based language models through (self-)supervised learning. We also present a reinforcement learning agent with dynamically constrained actions that learns interactive search strategies from scratch. Our search agents obtain retrieval and answer quality performance comparable to recent neural methods, using only a traditional term-based BM25 ranking function and interpretable discrete reranking and filtering actions. View details
    Preview abstract Neural retrieval models have superseded classic bag-of-words methods such as BM25 as the retrieval framework of choice. However, neural systems lack the interpretability of bag-of-words models; it is not trivial to connect a query change to a change in the latent space that ultimately determines the retrieval results. To shed light on this embedding space, we learn a "query decoder" that, given a latent representation of a neural search engine, generates the corresponding query. We show that it is possible to decode a meaningful query from its latent representation and, when moving in the right direction in latent space, to decode a query that retrieves the relevant paragraph. In particular, the query decoder can be useful to understand "what should have been asked" to retrieve a particular paragraph from the collection. We employ the query decoder to generate a large synthetic dataset of query reformulations for MSMarco, leading to improved retrieval performance. On this data, we train a pseudo-relevance feedback (PRF) T5 model for the application of query suggestion that outperforms both query reformulation and PRF information retrieval baselines. View details
    Preview abstract Large language models (LLMs) have shown impressive results across a variety of tasks while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM to attribute the text that it generates is likely to be crucial for both system developers and users in this setting. We propose and study Attributed QA as a key first step in the development of attributed LLMs. We develop a reproducable evaluation framework for the task, using human annotations as a gold standard and a correlated automatic metric that we show is suitable for development settings. We describe and benchmark a broad set of architectures for the task. Our contributions give some concrete answers to two key questions (How to measure attribution?, and How well do current state-of-the-art methods perform on attribution?), and give some hints as to how to address a third key question (How to build LLMs with attribution?). View details
    CLIMATE-FEVER: A Dataset for Verification of Real-World Climate Claims
    Jordan Boyd-Graber
    Markus Leippold
    Thomas Diggelmann
    NeurIPS 2020 Workshop on Tackling Climate Change with Machine Learning (to appear)
    Preview abstract Our goal is to introduce CLIMATE-FEVER, a new publicly available dataset for verification of climate change-related claims. By providing a dataset for the research community, we aim to help and encourage work on improving algorithms for retrieving climate-specific information and detecting fake news in social and mass media to reduce the impact of misinformation on the formation of public opinion on climate change. We adapt the methodology of FEVER, the largest dataset of artificially designed claims, to real-life claims collected from the Internet. Although during this process, we could count on the support of renowned climate scientists, it turned out to be no easy task. We discuss the surprising, subtle complexity of modeling real-world climate-related claims within the FEVER framework, which provides a valuable challenge for general natural language understanding. We hope that our work will mark the beginning of an exciting long-term joint effort by the climate science and AI community to develop robust algorithms to verify the facts for climate-related claims. View details
    On Identifiability in Transformers
    Gino Brunner
    Yang Liu
    Damian Pascual Ortiz
    Oliver Richter
    Roger Wattenhofer
    ICLR (2020)
    Preview abstract In this paper we delve deep in the Transformer architecture by investigating two of its core components: self-attention and contextual embeddings. In particular, we study the identifiability of attention weights and token embeddings, and the aggregation of context into hidden tokens. We show that, for sequences longer than the attention head dimension, attention weights are not identifiable. We propose effective attention as a complementary tool for improving explanatory interpretations based on attention. Furthermore, we show that input tokens retain to a large degree their identity across the model. We also find evidence suggesting that identity information is mainly encoded in the angle of the embeddings and gradually decreases with depth. Finally, we demonstrate strong mixing of input information in the generation of contextual embeddings by means of a novel quantification method based on gradient attribution. Overall, we show that self-attention distributions are not directly interpretable and present tools to better understand and further investigate Transformer models. View details
    CLIMATEXT: A Dataset for Climate Change Topic Detection
    Francesco Saverio Varini
    Jordan Boyd-Graber
    Markus Leippold
    NeurIPS 2020 Workshop on Tackling Climate Change with Machine Learning (to appear)
    Preview abstract Climate change communication in the mass media and other textual sources may affect and shape public perception. Extracting climate change information from these sources is an important task, e.g., for filtering content and e-discovery, sentiment analysis, automatic summarization, question-answering, and fact-checking. However, automating this process is a challenge, as climate change is a complex, fast-moving, and often ambiguous topic with scarce resources for popular text-based AI tasks. In this paper, we introduce ClimaText, a dataset for sentence-based climate change topic detection, which we make publicly available. We explore different approaches to identify the climate change topic in various text sources. We find that popular keyword-based models are not adequate for such a complex and evolving task. Context-based algorithms like BERT can detect, in addition to many trivial cases, a variety of complex and implicit topic patterns. Nevertheless, our analysis reveals a great potential for improvement in several directions, such as, e.g., capturing the discussion on indirect effects of climate change. Hence, we hope this work can serve as a good starting point for further research on this topic. View details
    Multi-agent query reformulation: Challenges and the role of diversity
    Rodrigo Frassetto Nogueira
    Deep Reinforcement Learning Meets Structured Prediction, ICLR, New Orleans, Louisiana, United States (2019)
    Preview abstract We investigate methods to efficiently learn diverse strategies in reinforcement learning for a generative structured prediction problem: query reformulation. In the proposed framework an agent consists of multiple specialized sub-agents and a meta-agent that learns to aggregate the answers from sub-agents to produce a final answer. Sub-agents are trained on disjoint partitions of the training data, while the meta-agent is trained on the full training set. Our method makes learning faster, because it is highly parallelizable, and has better generalization performance than strong baselines, such as an ensemble of agents trained on the full data. We evaluate on the tasks of document retrieval and question answering. The improved performance seems due to the increased diversity of reformulation strategies. This suggests that multi-agent, hierarchical approaches might play an important role in structured prediction tasks of this kind. However, we also find that it is not obvious how to characterize diversity in this context, and a first attempt based on clustering did not produce good results. Furthermore, reinforcement learning for the reformulation task is hard in high-performance regimes. At best, it only marginally improves over the state of the art, which highlights the complexity of training models in this framework for end-to-end language understanding problems. View details
    Preview abstract We frame Question Answering (QA) as a Reinforcement Learning task, an approach that we call Active Question Answering. We propose an agent that sits between the user and a black box QA system and learns to reformulate questions to elicit the best possible answers. The agent probes the system with, potentially many, natural language reformulations of an initial question and aggregates the returned evidence to yield the best answer. The reformulation system is trained end-to-end to maximize answer quality using policy gradient. We evaluate on SearchQA, a dataset of complex questions extracted from Jeopardy!. The agent outperforms a state-of-the-art base model, playing the role of the environment, and other benchmarks. We also analyze the language that the agent has learned while interacting with the question answering system. We find that successful question reformulations look quite different from natural language paraphrases. The agent is able to discover non-trivial reformulation strategies that resemble classic information retrieval techniques such as term re-weighting (tf-idf) and stemming. View details
    Preview abstract We analyze the language learned by an agent trained with reinforcement learning as a component of the ActiveQA system [Buck et al., 2017]. In ActiveQA, question answering is framed as a reinforcement learning task in which an agent sits between the user and a black box question-answering system. The agent learns to reformulate the user's questions to elicit the optimal answers. It probes the system with many versions of a question that are generated via a sequence-to-sequence question reformulation model, then aggregates the returned evidence to find the best answer. This process is an instance of machine-machine communication. The question reformulation model must adapt its language to increase the quality of the answers returned, matching the language of the question answering system. We find that the agent does not learn transformations that align with semantic intuitions but discovers through learning classical information retrieval techniques such as tf-idf re-weighting and stemming. View details
    Preview abstract In this paper we study the problem of linking open-domain web-search queries towards entities drawn from the full entity inventory of Wikipedia articles. We introduce SMAPH- 2 to attack this problem, a second-order approach that, by piggybacking on a web search engine, alleviates the noise and irregularities that characterize the language of queries and puts queries in a larger context in which it is easier to make sense of them. The key algorithmic idea under- lying SMAPH-2 is to first discover a candidate set of entities and then link-back those entities to their mentions occurring in the input query. This allows us to confine the possible concepts pertinent to the query to only the ones really mentioned in it. The link-back is implemented via a collective disambiguation step based upon a supervised ranking model that makes one joint prediction for the annotation of the complete query optimizing directly the F1 mea- sure. We evaluate both known features, such as word em- beddings and semantic relatedness among entities, and several novel features such as an approximate distance between mentions and entities (which can handle spelling errors). We demonstrate that SMAPH-2 achieves state-of-the-art on the ERD@SIGIR2014 benchmark. We also publish GERDAQ, a novel dataset we built specifically for web-query entity linking via a crowdsourcing effort, and show that SMAPH- 2 outperforms the benchmarks by comparable margins on GERDAQ. View details
    Using Entity Information from a Knowledge Base to Improve Relation Extraction
    Lan Du
    Anish Kumar
    M. Johnson
    Proceedings of the 13th annual workshop of The Australasian Language Technology Association, Association for Computational Linguistics (2015)
    Preview abstract Relation extraction is the task of extracting predicate-argument relationships between entities from natural language text. This paper investigates whether background information about entities available in knowledge bases such as FreeBase can be used to improve the accuracy of a state-of-the-art relation extraction system. We describe a simple and effective way of incorporating FreeBase’s notable types into a state-of-the-art relation extraction system (Riedel et al., 2013). Experimental results show that our notable type-based system achieves an average 7.5% weighted MAP score improvement. To understand where the notable type information contributes the most, we perform a series of ablation experiments. Results show that the notable type information improves relation extraction more than NER labels alone across a wide range of entity types and relations. View details
    A Computationally Efficient Algorithm for Learning Topical Collocation Models
    Zhendong Zhao
    Lan Du
    John K Pate
    Mark Steedman
    Mark Johnson
    Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Association for Computational Linguistics, Beijing, China (2015), pp. 1460-1469
    Preview abstract Most existing topic models make the bagof-words assumption that words are generated independently, and so ignore potentially useful information about word order. Previous attempts to use collocations (short sequences of adjacent words) in topic models have either relied on a pipeline approach, restricted attention to bigrams, or resulted in models whose inference does not scale to large corpora. This paper studies how to simultaneously learn both collocations and their topic assignments. We present an efficient reformulation of the Adaptor Grammar-based topical collocation model (AG-colloc) (Johnson, 2010), and develop a point-wise sampling algorithm for posterior inference in this new formulation. We further improve the efficiency of the sampling algorithm by exploiting sparsity and parallelising inference. Experimental results derived in text classification, information retrieval and human evaluation tasks across a range of datasets show that this reformulation scales to hundreds of thousands of documents while maintaining the good performance of the AG-colloc model. View details
    A Scalable Gibbs Sampler for Probabilistic Entity Linking
    Advances in Information Retrieval (ECIR 2014), Springer International Publishing, pp. 335-346
    Preview abstract Entity linking involves labeling phrases in text with their referent entities, such as Wikipedia or Freebase entries. This task is challenging due to the large number of possible entities, in the millions, and heavy-tailed mention ambiguity. We formulate the problem in terms of probabilistic inference within a topic model, where each topic is associated with a Wikipedia article. To deal with the large number of topics we propose a novel efficient Gibbs sampling scheme which can also incorporate side information, such as the Wikipedia graph. This conceptually simple probabilistic approach achieves state-of-the-art performance in entity-linking on the Aida-CoNLL dataset. View details
    The SMAPH System for Query Entity Recognition and Disambiguation
    Marco Cornolti
    Paolo Ferragina
    Stefan Rued
    Hinrich Schuetze
    ERD 2014: Entity Recognition and Disambiguation Challenge. SIGIR Forum., ACM
    Preview abstract The SMAPH system implements a pipeline of four main steps: (1) Fetching – it fetches the search results returned by a search engine given the query to be annotated; (2) Spotting – search result snippets are parsed to identify candidate mentions for the entities to be annotated. This is done in a novel way by detecting the keywords-in-context by looking at the bold parts of the search snippets; (3) Candidate generation – candidate entities are generated in two ways: from the Wikipedia pages occurring in the search results, and from an existing annotator, using the mentions identified in the spotting step as input; (4) Pruning – a binary SVM classifier is used to decide which entities to keep/discard in order to generate the final annotation set for the query. The SMAPH system ranked third on the development set and first on the final blind test of the 2014 ERD Challenge short text track. View details
    A Framework for Benchmarking Entity-Annotation Systems
    Marco Cornolti
    Paolo Ferragina
    Proceedings of the International World Wide Web Conference (WWW) (Practice & Experience Track), ACM (2013)
    Preview abstract In this paper we design and implement a benchmarking framework for fair and exhaustive comparison of entity-annotation systems. The framework is based upon the definition of a set of problems related to the entity-annotation task, a set of measures to evaluate systems performance, and a systematic comparative evaluation involving all publicly available datasets, containing texts of various types such as news, tweets and Web pages. Our framework is easily-extensible with novel entity annotators, datasets and evaluation measures for comparing systems, and it has been released to the public as open source. We use this framework to perform the first extensive comparison among all available entity annotators over all available datasets, and draw many interesting conclusions upon their efficiency and effectiveness. We also draw conclusions between academic versus commercial annotators. View details
    Topical clustering of search results
    Ugo Scaiella
    Paolo Ferragina
    Andrea Marino
    Proceedings of the fifth ACM international conference on Web search and data mining, ACM, New York, NY, USA (2012), pp. 223-232
    Preview abstract Search results clustering (SRC) is a challenging algorithmic problem that requires grouping together the results returned by one or more search engines in topically coherent clusters, and labeling the clusters with meaningful phrases describing the topics of the results included in them. In this paper we propose to solve SRC via an innovative approach that consists of modeling the problem as the labeled clustering of the nodes of a newly introduced graph of topics. The topics are Wikipedia-pages identified by means of recently proposed topic annotators [9, 11, 16, 20] applied to the search results, and the edges denote the relatedness among these topics computed by taking into account the linkage of the Wikipedia-graph. We tackle this problem by designing a novel algorithm that exploits the spectral properties and the labels of that graph of topics. We show the superiority of our approach with respect to academic state-of-the-art work [6] and well-known commercial systems (CLUSTY and LINGO3G) by performing an extensive set of experiments on standard datasets and user studies via Amazon Mechanical Turk. We test several standard measures for evaluating the performance of all systems and show a relative improvement of up to 20%. View details
    Learning to Rank Answers to Non-Factoid Questions from Web Collections
    Mihai Surdeanu
    Hugo Zaragoza
    Computational Linguistics, vol. 37 (2011), pp. 351-383
    Preview abstract This work investigates the use of linguistically motivated features to improve search, in particular for ranking answers to non-factoid questions. We show that it is possible to exploit existing large collections of question–answer pairs (from online social Question Answering sites) to extract such features and train ranking models which combine them effectively. We investigate a wide range of feature types, some exploiting natural language processing such as coarse word sense disambiguation, named-entity identification, syntactic parsing, and semantic role labeling. Our experiments demonstrate that linguistic features, in combination, yield considerable improvements in accuracy. Depending on the system settings we measure relative improvements of 14% to 21% in Mean Reciprocal Rank and Precision@1, providing one of the most compelling evidence to date that complex linguistic features such as word senses and semantic roles can have a significant impact on large-scale information retrieval tasks. View details
    Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition
    Stefan Rued
    Jens Mueller
    Hinrich Schuetze
    49th Annual Meeting of the Association for Computational Linguistics (ACL-HLT), Association for Computational Linguistics (2011), pp. 965-975
    Preview abstract We use search engine results to address a particularly dif?cult cross-domain language processing task, the adaptation of named entity recognition (NER) from news text to web queries. The key novelty of the method is that we submit a token with context to a search engine and use similar contexts in the search results as additional information for correctly classifying the token. We achieve strong gains in NER performance on news, in-domain and out-of-domain, and on web queries. View details
    Learning Dense Models of Query Similarity from User Click Logs
    Fabio De Bona
    Stefan Riezler
    Keith Hall
    Amac Herdagdelen
    Maria Holmqvist
    Proceedings of NAACL-HLT 2010
    Generalized Syntactic and Semantic Models of Query Reformulation
    Amac Herdagdelen
    Daniel Mahler
    Maria Holmqvist
    Keith Hall
    Stefan Riezler
    Proceedings of SIGIR-2010
    Gazpacho and summer rash: lexical relationships from temporal patterns of web search queries
    Keith Hall
    Proceedings of the conference on Empirical Methods in Natural Language Processing (EMNLP) (2009)
    The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages
    Jan Hajič
    Richard Johansson
    Daisuke Kawahara
    Maria Antònia Martí
    Lluís Màrquez
    Adam Meyers
    Joakim Nivre
    Sebastian Padó
    Jan Štepánek
    Pavel Straňák
    Mihai Surdeanu
    Nianwen Xue
    Yi Zhang
    Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task, Association for Computational Linguistics, 209 N. Eight Street, Stroudsburg, PA 18360, pp. 1-18
    To swing or not to swing: learning when (not) to advertise
    Marcus Fontoura
    Evgeniy Gabrilovich
    Vanja Josifovski
    Vanessa Murdock
    Vassilis Plachouras
    CIKM (2008), pp. 1003-1012
    Hierarchical Semantic Classification: Word Sense Disambiguation with World Knowledge
    Thomas Hofmann
    Mark Johnson
    IJCAI (2003), pp. 817-822