John Blitzer

John Blitzer

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Many latent (factorized) models have been proposed for recommendation tasks like collaborative filtering and for ranking tasks like document or image retrieval and annotation. Common to all those methods is that during inference the items are scored independently by their similarity to the query in the latent embedding space. The structure of the ranked list (i.e. considering the set of items returned as a whole) is not taken into account. This can be a problem because the set of top predictions can be either too diverse (contain results that contradict each other) or are not diverse enough. In this paper we introduce a method for learning latent structured rankings that improves over existing methods by providing the right blend of predictions at the top of the ranked list. Particular emphasis is put on making this method scalable. Empirical results on large scale image annotation and music recommendation tasks show improvements over existing approaches. View details
    Domain Adaptation with Coupled Subspaces
    Sham Kakade
    Dean Foster
    Artificial Intelligence and Statistics (2011)
    Preview abstract Domain adaptation algorithms address a key issue in applied machine learning: How can we train a system under a source distribution but achieve high performance under a different target distribution? We tackle this question for divergent distributions where crucial predictive target features may not even have support under the source distribution. In this setting, the key intuition is that that if we can link target-specific features to source features, we can learn effectively using only source labeled data. We formalize this intuition, as well as the assumptions under which such coupled learning is possible. This allows us to give finite sample target error bounds (using only source training data) and an algorithm which performs at the state-of-the-art on two natural language processing adaptation tasks which are characterized by novel target features. View details
    Learning Better Monolingual Models with Unannotated Bilingual Text
    David Burkett
    Dan Klein
    Fourteenth Conference on Computational Natural Language Learning (CoNLL '10) (2010)
    Preview
    A theory of learning from different domains
    Shai Ben-David
    Koby Crammer
    Alex Kulesza
    Jennifer Vaughan
    Machine Learning, 79 (2010), pp. 151-175
    Preview abstract Discriminative learning methods for classification perform well when training and test data are drawn from the same distribution. Often, however, we have plentiful labeled training data from a source domain but wish to learn a classifier which performs well on a target domain with a different distribution and little or no labeled training data. In this work we investigate two questions. First, under what conditions can a classifier trained from source data be expected to perform well on target data? Second, given a small amount of labeled target data, how should we combine it during training with the large amount of labeled source data to achieve the lowest target error at test time? We address the first question by bounding a classifier's target error in terms of its source error and the divergence between the two domains. We give a classifier-induced divergence measure that can be estimated from finite, unlabeled samples from the domains. Under the assumption that there exists some hypothesis that performs well in both domains, we show that this quantity together with the empirical source error characterize the target error of a source-trained classifier. We answer the second question by bounding the target error of a model which minimizes a convex combination of the empirical source and target errors. Previous theoretical work has considered minimizing just the source error, just the target error, or weighting instances from the two domains equally. We show how to choose the optimal combination of source and target error as a function of the divergence, the sample sizes of both domains, and the complexity of the hypothesis class. The resulting bound generalizes the previously studied cases and is always at least as tight as a bound which considers minimizing only the target error or an equal weighting of source and target errors. View details
    Learning Bounds for Domain Adaptation
    Koby Crammer
    Alex Kulesza
    Jennifer Wortman
    Advances in Neural Information Processing Systems 20, {MIT} Press, Cambridge, MA (2008)
    Preview
    Intelligent Email: Reply and Attachment Prediction
    Mark Dredze
    Tova Brooks
    Josh Carroll
    Joshua Magarick
    Proceedings of the 2008 International Conference on Intelligent User Interfaces
    Preview
    Frustratingly Hard Domain Adaptation for Dependency Parsing
    Mark Dredze
    João V. Graça
    Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 1051-1055
    Preview