Kuzman Ganchev

Kuzman Ganchev

I was born in Sofia, Bulgaria where I lived until February 1989. My family moved to Zimbabwe and then in 1995 to New Zealand where I went to high school. I came to the US in 1999 to study at Swarthmore College. I spent the 2001-2002 academic year studying abroad in Paris. After graduating with a Bachelor of Arts in Computer Science in 2003 I worked at StreamSage Inc. in Washington DC until starting at the University of Pennsylvania in Fall 2004. During the summer of 2007 I was an intern at TrialPay in Mountain View, CA and during the summer of 2008 I was an intern at Bank of America in New York. I graduated from UPenn in 2010 and have since been working at Google Inc. in New York.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Conditional Generation with a Question-Answering Blueprint
    Reinald Kim Amplayo
    Fantine Huot
    Mirella Lapata
    Transactions of the Association for Computational Linguistics(2023) (to appear)
    Preview abstract The ability to convey relevant and faithful information is critical for many tasks in conditional generation and yet remains elusive for neural seq-to-seq models whose outputs often reveal hallucinations and fail to correctly cover important details. In this work, we advocate planning as a useful intermediate representation for rendering conditional generation less opaque and more grounded. Our work proposes a new conceptualization of text plans as a sequence of question-answer (QA) pairs. We enhance existing datasets (e.g., for summarization) with a QA blueprint operating as a proxy for both content selection (i.e., what to say) and planning (i.e., in what order). We obtain blueprints automatically by exploiting state-of-the-art question generation technology and convert input-output pairs into input-blueprint-output tuples. We develop Transformer-based models, each varying in how they incorporate the blueprint in the generated output (e.g., as a global plan or iteratively). Evaluation across metrics and datasets demonstrates that blueprint models are more factual than alternatives which do not resort to planning and allow tighter control of the generation output. View details
    Text-Blueprint: An Interactive Platform for Plan-based Conditional Generation
    Fantine Huot
    Reinald Kim Amplayo
    Mirella Lapata
    Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations(2023)
    Preview abstract While conditional generation models can now generate natural language well enough to create fluent text, it is still difficult to control the generation process, leading to irrelevant, repetitive, and hallucinated content. Recent work shows that planning can be a useful intermediate step to render conditional generation less opaque and more grounded. We present a web browser-based demonstration for query-focused summarization that uses a sequence of question-answer pairs, as a blueprint plan for guiding text generation (i.e., what to say and in what order). We illustrate how users may interact with the generated text and associated plan visualizations, e.g., by editing and modifying the blueprint in order to improve or control the generated output. View details
    Preview abstract Large language models (LLMs) have shown impressive results across a variety of tasks while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM to attribute the text that it generates is likely to be crucial for both system developers and users in this setting. We propose and study Attributed QA as a key first step in the development of attributed LLMs. We develop a reproducable evaluation framework for the task, using human annotations as a gold standard and a correlated automatic metric that we show is suitable for development settings. We describe and benchmark a broad set of architectures for the task. Our contributions give some concrete answers to two key questions (How to measure attribution?, and How well do current state-of-the-art methods perform on attribution?), and give some hints as to how to address a third key question (How to build LLMs with attribution?). View details
    Preview abstract The availability of large, high-quality datasets has been one of the main drivers of recent progress in question answering (QA). Such annotated datasets however are difficult and costly to collect, and rarely exist in languages other than English, rendering QA technology inaccessible to underrepresented languages. An alternative to building large monolingual training datasets is to leverage pre-trained language models (PLMs) under a few-shot learning setting. Our approach, QAmeleon, uses a PLM to automatically generate multilingual data upon which QA models are trained, thus avoiding costly annotation. Prompt tuning the PLM for data synthesis with only five examples per language delivers accuracy superior to translation-based baselines, bridges nearly 60% of the gap between an English-only baseline and a fully supervised upper bound trained on almost 50,000 hand labeled examples, and always leads to substantial improvements compared to fine-tuning a QA model directly on labeled examples in low resource settings. Experiments on the TyDiQA-GoldP and MLQA benchmarks show that few-shot prompt tuning for data synthesis scales across languages and is a viable alternative to large-scale annotation. View details
    Preview abstract The paper presents an approach to semantic grounding of language models (LMs) that conceptualizes the LM as a conditional model generating text given a desired semantic message. It embeds the LM in an auto-encoder by feeding its output to a semantic parser whose output is in the same representation domain as the input message. Compared to a baseline that generates text using greedy search, we demonstrate two techniques that improve the fluency and semantic accuracy of the generated text: The first technique samples multiple candidate text sequences from which the semantic parser chooses. The second trains the language model while keeping the semantic parser frozen to improve the semantic accuracy of the auto-encoder. We carry out experiments on the English WebNLG 3.0 data set, using BLEU to measure the fluency of generated text and standard parsing metrics to measure semantic accuracy. We show that our proposed approaches significantly improve on the greedy search baseline. Human evaluation corroborates the results of the automatic evaluation experiments. View details
    Preview abstract A wide variety of neural-network architectures have been proposed for the task of Chinese word segmentation. Surprisingly, we find that a bidirectional LSTM model, when combined with standard deep learning techniques and best practices, can achieve better accuracy on many of the popular datasets as compared to models based on more complex neuralnetwork architectures. Furthermore, our error analysis shows that out-of-vocabulary words remain challenging for neural-network models, and many of the remaining errors are unlikely to be fixed through architecture changes. Instead, more effort should be made on exploring resources for further improvement. View details
    Preview abstract We introduce a globally normalized transition-based neural network model that achieves state-of-the-art part-of-speech tagging, dependency parsing and sentence compression results. Our model is a simple feed-forward neural network that operates on a task-specific transition system, yet achieves comparable or better accuracies than recurrent models. We discuss the importance of global as opposed to local normalization: a key insight is that the label bias problem implies that globally normalized models can be strictly more expressive than locally normalized models. View details
    Semantic Role Labeling with Neural Network Factors
    Oscar Täckström
    Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP '15), Association for Computational Linguistics
    Preview abstract We present a new method for semantic role labeling in which arguments and semantic roles are jointly embedded in a shared vector space for a given predicate. These embeddings belong to a neural network, whose output represents the potential functions of a graphical model designed for the SRL task. We consider both local and structured learning methods and obtain strong results on standard PropBank and FrameNet corpora with a straightforward product-of-experts model. We further show how the model can learn jointly from PropBank and FrameNet annotations to obtain additional improvements on the smaller FrameNet dataset. View details
    Efficient Inference and Structured Learning for Semantic Role Labeling
    Oscar Täckström
    Transactions of the Association for Computational Linguistics, 3(2015), pp. 29-41
    Preview abstract We present a dynamic programming algorithm for efficient constrained inference in semantic role labeling. The algorithm tractably captures a majority of the structural constraints examined by prior work in this area, which has resorted to either approximate methods or off-the-shelf integer linear programming solvers. In addition, it allows training a globally-normalized log-linear model with respect to constrained conditional likelihood. We show that the dynamic program is several times faster than an off-the-shelf integer linear programming solver, while reaching the same solution. Furthermore, we show that our structured model results in significant improvements over its local counterpart, achieving state-of-the-art results on both PropBank- and FrameNet-annotated corpora. View details
    Semantic Frame Identification with Distributed Word Representations
    Karl Moritz Hermann
    Jason Weston
    Proceedings of the 52th Annual Meeting of the Association for Computational Linguistics(2014)
    Preview abstract We present a novel technique for semantic frame identification using distributed representations of predicates and their syntactic context; this technique leverages automatic syntactic parses and a generic set of word embeddings. Given labeled data annotated with frame-semantic parses, we learn a model that projects the set of word representations for the syntactic context around a predicate to a low dimensional representation. The latter is used for semantic frame identification; with a standard argument identification method inspired by prior work, we achieve state-of-the-art results on FrameNet-style frame-semantic analysis. Additionally, we report strong results on PropBank-style semantic role labeling in comparison to prior work. View details
    Preview abstract Entity type tagging is the task of assigning category labels to each mention of an entity in a document. While standard systems focus on a small set of types, recent work (Ling and Weld, 2012) suggests that using a large fine-grained label set can lead to dramatic improvements in downstream tasks. In the absence of labeled training data, existing fine-grained tagging systems obtain examples automatically, using resolved entities and their types extracted from a knowledge base. However, since the appropriate type often depends on context (e.g. Washington could be tagged either as city or government), this procedure can result in spurious labels, leading to poorer generalization. We propose the task of context-dependent fine type tagging, where the set of acceptable labels for a mention is restricted to only those deducible from the local context (e.g. sentence or document). We introduce new resources for this task: 11,304 mentions annotated with their context-dependent fine types, and we provide baseline experimental results on this data. View details
    Universal Dependency Annotation for Multilingual Parsing
    Ryan McDonald
    Joakim Nivre
    Yoav Goldberg
    Yvonne Quirmbach-Brundage
    Keith Hall
    Slav Petrov
    Hao Zhang
    Oscar Tackstrom
    Claudia Bedini
    Nuria Bertomeu Castello
    Jungmee Lee
    Association for Computational Linguistics, Association for Computational Linguistics(2013)
    Preview
    Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
    Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
    Preview abstract We present a framework for cross-lingual transfer of sequence information from a resource-rich source language to a resource-impoverished target language that incorporates soft constraints via posterior regularization. To this end, we use automatically word aligned bitext between the source and target language pair, and learn a discriminative conditional random field model on the target side. Our posterior regularization constraints are derived from simple intuitions about the task at hand and from cross-lingual alignment information. We show improvements over strong baselines for two tasks: part-of-speech tagging and named-entity segmentation. View details
    Using Search-Logs to Improve Query Tagging
    Keith B. Hall
    Ryan McDonald
    Slav Petrov
    Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Volume 2: Short Papers (ACL '12)(2012)
    Preview
    Posterior Sparsity in Dependency Grammar Induction
    Jennifer Gillenwater
    Joao Graca
    Ben Taskar
    Journal of Machine Learning Research, 12(2011), pp. 455-490
    Preview abstract A strong inductive bias is essential in unsupervised grammar induction. In this paper, we explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. We use part-of-speech (POS) tags to group dependencies by parent-child types and investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In experiments with 12 different languages, we achieve significant gains in directed attachment accuracy over the standard expectation maximization (EM) baseline, with an average accuracy improvement of 6.5%, outperforming EM by at least 1% for 9 out of 12 languages. Furthermore, the new method outperforms models based on standard Bayesian sparsity-inducing parameter priors with an average improvement of 5% and positive gains of at least 1% for 9 out of 12 languages. On English text in particular, we show that our approach improves performance over other state-of-the-art techniques. View details
    Controlling Complexity in Part-of-Speech Induction
    Joao Graca
    Luisa Coheur
    Ben Taskar
    Journal of Artificial Intelligence Research (JAIR), 41(2011), pp. 527-551
    Preview abstract We consider the problem of fully unsupervised learning of grammatical (part-of-speech) categories from unlabeled text. The standard maximum-likelihood hidden Markov model for this task performs poorly, because of its weak inductive bias and large model capacity. We address this problem by refining the model and modifying the learning objective to control its capacity via para- metric and non-parametric constraints. Our approach enforces word-category association sparsity, adds morphological and orthographic features, and eliminates hard-to-estimate parameters for rare words. We develop an efficient learning algorithm that is not much more computationally intensive than standard training. We also provide an open-source implementation of the algorithm. Our experiments on five diverse languages (Bulgarian, Danish, English, Portuguese, Spanish) achieve significant improvements compared with previous methods for the same task. View details
    Sparsity in Dependency Grammar Induction
    Jennifer Gillenwater
    João Graça
    Ben Taskar
    48th Annual Meeting of the Association for Computational Linguistics (ACL 2010)
    Preview
    Posterior vs. Parameter Sparsity in Latent Variable Models
    Joao Graca
    Ben Taskar
    Advances in Neural Information Processing Systems 22(2009), pp. 664-672
    Preview abstract In this paper we explore the problem of biasing unsupervised models to favor sparsity. We extend the posterior regularization framework [8] to encourage the model to achieve posterior sparsity on the unlabeled training data. We apply this new method to learn first-order HMMs for unsupervised part-of-speech (POS) tagging, and show that HMMs learned this way consistently and significantly out-performs both EM-trained HMMs, and HMMs with a sparsity-inducing Dirichlet prior trained by variational EM. We evaluate these HMMs on three languages — English, Bulgarian and Portuguese — under four conditions. We find that our method always improves performance with respect to both baselines, while variational Bayes actually degrades performance in most cases. We increase accuracy with respect to EM by 2.5%-8.7% absolute and we see improvements even in a semisupervised condition where a limited dictionary is provided. View details
    Frustratingly Hard Domain Adaptation for Dependency Parsing
    Mark Dredze
    Partha Pratim Talukdar
    João V. Graça
    Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 1051-1055
    Preview
    Learning Tractable Word Alignment Models with Complex Constraints
    Joao Graca
    Ben Taskar
    Computational Linguistics, 36(2010), pp. 481-504
    Preview abstract Word-level alignment of bilingual text is a critical resource for a growing variety of tasks. Probabilistic models for word alignment present a fundamental trade-off between richness of captured constraints and correlations versus efficiency and tractability of inference. In this article, we use the Posterior Regularization framework (Graça, Ganchev, and Taskar 2007) to incorporate complex constraints into probabilistic models during learning without changing the efficiency of the underlying model. We focus on the simple and tractable hidden Markov model, and present an efficient learning algorithm for incorporating approximate bijectivity and symmetry constraints. Models estimated with these constraints produce a significant boost in performance as measured by both precision and recall of manually annotated alignments for six language pairs. We also report experiments on two different tasks where word alignments are required: phrase-based machine translation and syntax transfer, and show promising improvements over standard methods. View details
    Posterior Regularization for Structured Latent Variable Models
    Joao Graca
    Jennifer Gillenwater
    Ben Taskar
    Journal of Machine Learning Research, 11(2010), pp. 2001-2049
    Preview abstract We present posterior regularization, a probabilistic framework for structured, weakly supervised learning. Our framework efficiently incorporates indirect supervision via constraints on posterior distributions of probabilistic models with latent variables. Posterior regularization separates model complexity from the complexity of structural constraints it is desired to satisfy. By directly imposing decomposable regularization on the posterior moments of latent variables during learning, we retain the computational efficiency of the unconstrained model while ensuring desired constraints hold in expectation. We present an efficient algorithm for learning with posterior regularization and illustrate its versatility on a diverse set of structural constraints such as bijectivity, symmetry and group sparsity in several large scale experiments, including multi-view learning, cross-lingual dependency grammar induction, unsupervised part-of-speech induction, and bitext word alignment. View details
    Dependency Grammar Induction via Bitext Projection Constraints
    Jennifer Gillenwater
    Ben Taskar
    47th Annual Meeting of the Association for Computational Linguistics (ACL), Association for Computational Linguistics(2009), pp. 369-377
    Preview abstract Broad-coverage annotated treebanks necessary to train parsers do not exist for many resource-poor languages. The wide availability of parallel text and accurate parsers in English has opened up the possibility of grammar induction through partial transfer across bitext. We consider generative and discriminative models for dependency grammar induction that use word-level alignments and a source language parser (English) to constrain the space of possible target trees. Unlike previous approaches, our framework does not require full projected parses, allowing partial, approximate transfer through linear expectation constraints on the space of distributions over trees. We consider several types of constraints that range from generic dependency conservation to language-specific annotation rules for auxiliary verb analysis. We evaluate our approach on Bulgarian and Spanish CoNLL shared task data and show that we consistently outperform unsupervised methods and can outperform supervised learning for limited training data. View details
    Expectation Maximization and Posterior Constraints
    Joao Graca
    Ben Taskar
    Advances in Neural Information Processing Systems 20, MIT Press, Cambridge, MA(2008), pp. 569-576
    Preview abstract The expectation maximization (EM) algorithm is a widely used maximum likelihood estimation procedure for statistical models when the values of some of the variables in the model are not observed. Very often, however, our aim is primarily to find a model that assigns values to the latent variables that have intended meaning for our data and maximizing expected likelihood only sometimes accomplishes this. Unfortunately, it is typically difficult to add even simple a-priori information about latent variables in graphical models without making the models overly complex or intractable. In this paper, we present an efficient, principled way to inject rich constraints on the posteriors of latent variables into the EM algorithm. Our method can be used to learn tractable graphical models that satisfy additional, otherwise intractable constraints. Focusing on clustering and the alignment problem for statistical machine translation, we show that simple, intuitive posterior constraints can greatly improve the performance over standard baselines and be competitive with more complex, intractable models. View details
    Multi-View Learning over Structured and Non-Identical Outputs
    Joao Graca
    Ben Taskar
    Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI), AUAI Press(2008), pp. 204-211
    Preview abstract In many machine learning problems, labeled training data is limited but unlabeled data is ample. Some of these problems have instances that can be factored into multiple views, each of which is nearly sufficent in determining the correct labels. In this paper we present a new algorithm for probabilistic multi-view learning which uses the idea of stochastic agreement between views as regularization. Our algorithm works on structured and unstructured problems and easily generalizes to partial agreement scenarios. For the full agreement case, our algorithm minimizes the Bhattacharyya distance between the models of each view, and performs better than CoBoosting and two-view Perceptron on several flat and structured classification problems. View details
    Better Alignments = Better Translations?
    João Graça
    Ben Taskar
    Proceedings of ACL-08: HLT, Association for Computational Linguistics, Columbus, Ohio(2008), pp. 986-993
    Preview abstract Automatic word alignment is a key step in training statistical machine translation systems. Despite much recent work on word alignment methods, alignment accuracy increases often produce little or no improvements in machine translation quality. In this work we analyze a recently proposed agreement-constrained EM algorithm for unsupervised alignment models. We attempt to tease apart the effects that this simple but effective modification has on alignment precision and recall trade-offs, and how rare and common words are affected across several language pairs. We propose and extensively evaluate a simple method for using alignment models to produce alignments better-suited for phrase-based MT systems, and show significant gains (as measured by BLEU score) in end-to-end translation systems for six languages pairs used in recent MT competitions. View details
    Small Statistical Models by Random Feature Mixing
    Mark Dredze
    Proceedings of the ACL-2008 Workshop on Mobile Language Processing, Association for Computational Linguistics, pp. 19-20
    Preview abstract The application of statistical NLP systems to resource constrained devices is limited by the need to maintain parameters for a large number of features and an alphabet mapping features to parameters. We introduce random feature mixing to eliminate alphabet storage and reduce the number of parameters without severely impacting model performance. View details
    Empirical Price Modeling for Sponsored Search.
    Ryan Gabbard
    Alex Kulesza
    Qian Liu
    Jinsong Tan
    Michael Kearns
    Third International Workshop on Internet and Network Economics (WINE), Springer(2007), pp. 541-548
    Preview abstract We present a characterization of empirical price data from sponsored search auctions. We show that simple models drawing bid values independently from a fixed distribution can be tuned to match empirical data on average, but still fail to account for deviations observed in individual auctions. Hypothesizing that these deviations are due to strategic bidding, we define measures of "jamming" behavior and show that actual auctions exhibit significantly more jamming than predicted by such models. Correspondingly, removing the jamming bids from observed auction data yields a much closer fit. We demonstrate that this characterization is a revealing tool for analysis, using model parameter values and measures of jamming to summarize the effects of query modifers on a set of keyword auctions. View details
    Semi-Automated Named Entity Annotation
    Mark Mandel
    Steven Carroll
    Peter White
    Proceedings of the Linguistic Annotation Workshop, Association for Computational Linguistics(2007), pp. 53-56
    Preview abstract We investigate a way to partially automate corpus annotation for named entity recognition, by requiring only binary decisions from an annotator. Our approach is based on a linear sequence model trained using a k-best MIRA learning algorithm. We ask an annotator to decide whether each mention produced by a high recall tagger is a true mention or a false positive. We conclude that our approach can reduce the effort of extending a seed training corpus by up to 58%. View details
    Automatic Code Assignment to Medical Text
    Koby Crammer
    Mark Dredze
    Partha Pratim Talukdar
    Steven Carroll
    Biological, translational, and clinical language processing, Association for Computational Linguistics(2007), pp. 129-136
    Preview abstract Code assignment is important for handling large amounts of electronic medical data in the modern hospital. However, only expert annotators with extensive training can assign codes. We present a system for the assignment of ICD-9-CM clinical codes to free text radiology reports. Our system assigns a code configuration, predicting one or more codes for each document. We combine three coding systems into a single learning system for higher accuracy. We compare our system on a real world medical dataset with both human annotators and other automated systems, achieving nearly the maximum score on the Computational Medicine Center’s challenge. View details
    Transductive structured classification through constrained min-cuts
    Proceedings of the Second Workshop on TextGraphs: Graph-Based Algorithms for Natural Language Processing, Association for Computational Linguistics(2007), pp. 37-44
    Preview abstract We extend the Blum and Chawla (2001) graph min-cut algorithm to structured problems. This extension can alternatively be viewed as a joint inference method over a set of training and test instances where parts of the instances interact through a pre-specified associative network. The method has has an efficient approximation through a linear-programming relaxation. On small training data sets, the method achieves up to 34.8% relative error reduction. View details
    Penn/UMass/CHOP Biocreative II systems
    Koby Crammer
    Gideon Mann
    Kedar Bellare
    Andrew McCallum
    Steven Carroll
    Yang Jin
    Peter White
    Proceedings of the Second BioCreative Challenge Evaluation Workshop(2007), pp. 119-124
    Preview abstract Our team participated in the entity tagging and normalization tasks of Biocreative II. For the entity tagging task, we used a k-best MIRA learning algorithm with lexicons and automatically derived word clusters. MIRA accommodates different training loss functions, which allowed us to exploit gene alternatives in training. We also performed a greedy search over feature templates and the development data, achieving a final F-measure of 86.28%. For the normalization task, we proposed a new specialized on-line learning algorithm and applied it for filtering out false positives from a high recall list of candidates. For normalization we received an F-measure of 69.8%. View details
    Nswap: a network swapping module for Linux clusters.
    Tia Newhall
    Sean Finney
    Michael Spiegel
    Proceedings of the 13th International Conference on Parallel and Distributed Computing (Euro-Par'03), Springer(2003)