Amarnag Subramanya

Amarnag Subramanya

Amarnag Subramanya is a senior research scientist in the machine learning & natural language processing group at Google Research. He received his PhD from University of Washington, Seattle in 2009. His interests include, semi-supervised learning, graphical models and their applications to various problems in natural language and speech.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Entity resolution is the task of linking each mention of an entity in text to the corresponding record in a knowledge base (KB). Coherence models for entity resolution encourage all referring expressions in a document to resolve to entities that are related in the KB. We explore attention-like mechanisms for coherence, where the evidence for each candidate is based on a small set of strong relations, rather than relations to all other entities in the document. The rationale is that document-wide support may simply not exist for non-salient entities, or entities not densely connected in the KB. Our proposed system outperforms state-of-the-art systems on the CoNLL 2003, TAC KBP 2010, 2011 and 2012 tasks. View details
    Preview abstract We describe an LSTM-based model which we call Byte-to-Span (BTS) that reads text as bytes and outputs span annotations of the form [start, length, label] where start positions, lengths, and labels are separate entries in our vocabulary. Because we operate directly on unicode bytes rather than language-specific words or characters, we can analyze text in many languages with a single model. Due to the small vocabulary size, these multilingual models are very compact, but produce results similar to or better than the state-of-the-art in Part-of-Speech tagging and Named Entity Recognition that use only the provided training datasets (no external data sources). Our models are learning “from scratch” in that they do not rely on any elements of the standard pipeline in Natural Language Processing (including tokenization), and thus can run in standalone fashion on raw text. View details
    Plato: A Selective Context Model for Entity Resolution
    Michael Ringgaard
    Transactions of the Association for Computational Linguistics, 3(2015), pp. 503-515
    Preview abstract We present Plato, a probabilistic model for entity resolution that includes a novel approach for handling noisy or uninformative features,and supplements labeled training data derived from Wikipedia with a very large unlabeled text corpus. Training and inference in the proposed model can easily be distributed across many servers, allowing it to scale to over 10^7 entities. We evaluate Plato on three standard datasets for entity resolution. Our approach achieves the best results to-date on TAC KBP 2011 and is highly competitive on both the CoNLL 2003 and TAC KBP 2012 datasets. View details
    Preview abstract Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses parallelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. To evaluate these ideas, we constructed a labeled corpus of 1:5 million disambiguated mentions in Web pages by selecting link anchors referring to Wikipedia entities. We show that the combination of the hierarchical model with distributed inference quickly obtains high accuracy (with error reduction of 38%) on this large dataset, demonstrating the scalability of our approach. View details
    Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models
    Slav Petrov
    Proceedings of the 2010 Conference on Empirical Methods on Natural Language Processing (EMNLP '10)
    Preview
    Distributed MAP Inference for Undirected Graphical Models
    Sameer Singh
    Andrew McCallum
    Workshop on Learning on Cores, Clusters and Clouds (LCCC), Neural Information Processing Society (NIPS)(2010)
    Preview
    Large Scale Graph Transduction
    Jeff Bilmes
    NIPS 2009 Workshop on Large-Scale Machine Learning: Parallelism and Massive Datasets, NIPS
    Preview abstract We consider the issue of scalability of graph-based semi-supervised learning (SSL) algorithms. In this context, we propose a fast graph node ordering algorithm that improves parallel spatial locality by being cache cognizant. This approach allows for a linear speedup on a shared-memory parallel machine to be achievable, and thus means that graph-based SSL can scale to very large data sets. We use the above algorithm an a multi-threaded implementation to solve a SSL problem on a 120 million node graph in a reasonable amount of time. View details
    Semi-Supervised Learning with Measure Propagation
    Jeff Bilmes
    Journal of Machine Learning Research (JMLR), 12(2011), pp. 3311-3370
    Preview abstract We describe a new objective for graph-based semi-supervised learning based on minimizing the Kullback-Leibler divergence between discrete probability measures that encode class membership probabilities. We show how the proposed objective can be efficiently optimized using alternating minimization. We prove that the alternating minimization procedure converges to the correct optimum and derive a simple test for convergence. In addition, we show how this approach can be scaled to solve the semi-supervised learning problem on very large data sets, for example, in one instance we use a data set with over 108 samples. In this context, we propose a graph node ordering algorithm that is also applicable to other graph-based semi-supervised learning approaches. We compare the proposed approach against other standard semi-supervised learning algorithms on the semi-supervised learning benchmark data sets (Chapelle et al., 2007), and other real-world tasks such as text classification on Reuters and WebKB, speech phone classification on TIMIT and Switchboard, and linguistic dialog-act tagging on Dihana and Switchboard. In each case, the proposed approach outperforms the state-of-the-art. Lastly, we show that our objective can be generalized into a form that includes the standard squared-error loss, and we prove a geometric rate of convergence in that case. View details