John Blitzer

John Blitzer

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Many latent (factorized) models have been proposed for recommendation tasks like collaborative filtering and for ranking tasks like document or image retrieval and annotation. Common to all those methods is that during inference the items are scored independently by their similarity to the query in the latent embedding space. The structure of the ranked list (i.e. considering the set of items returned as a whole) is not taken into account. This can be a problem because the set of top predictions can be either too diverse (contain results that contradict each other) or are not diverse enough. In this paper we introduce a method for learning latent structured rankings that improves over existing methods by providing the right blend of predictions at the top of the ranked list. Particular emphasis is put on making this method scalable. Empirical results on large scale image annotation and music recommendation tasks show improvements over existing approaches. View details
    Domain Adaptation with Coupled Subspaces
    Sham Kakade
    Dean Foster
    Artificial Intelligence and Statistics(2011)
    Preview abstract Domain adaptation algorithms address a key issue in applied machine learning: How can we train a system under a source distribution but achieve high performance under a different target distribution? We tackle this question for divergent distributions where crucial predictive target features may not even have support under the source distribution. In this setting, the key intuition is that that if we can link target-specific features to source features, we can learn effectively using only source labeled data. We formalize this intuition, as well as the assumptions under which such coupled learning is possible. This allows us to give finite sample target error bounds (using only source training data) and an algorithm which performs at the state-of-the-art on two natural language processing adaptation tasks which are characterized by novel target features. View details
    Learning Better Monolingual Models with Unannotated Bilingual Text
    David Burkett
    Slav Petrov
    Dan Klein
    Fourteenth Conference on Computational Natural Language Learning (CoNLL '10)(2010)
    Preview
    A theory of learning from different domains
    Shai Ben-David
    Koby Crammer
    Alex Kulesza
    Jennifer Vaughan
    Machine Learning, 79(2010), pp. 151-175
    Preview abstract Discriminative learning methods for classification perform well when training and test data are drawn from the same distribution. Often, however, we have plentiful labeled training data from a source domain but wish to learn a classifier which performs well on a target domain with a different distribution and little or no labeled training data. In this work we investigate two questions. First, under what conditions can a classifier trained from source data be expected to perform well on target data? Second, given a small amount of labeled target data, how should we combine it during training with the large amount of labeled source data to achieve the lowest target error at test time? We address the first question by bounding a classifier's target error in terms of its source error and the divergence between the two domains. We give a classifier-induced divergence measure that can be estimated from finite, unlabeled samples from the domains. Under the assumption that there exists some hypothesis that performs well in both domains, we show that this quantity together with the empirical source error characterize the target error of a source-trained classifier. We answer the second question by bounding the target error of a model which minimizes a convex combination of the empirical source and target errors. Previous theoretical work has considered minimizing just the source error, just the target error, or weighting instances from the two domains equally. We show how to choose the optimal combination of source and target error as a function of the divergence, the sample sizes of both domains, and the complexity of the hypothesis class. The resulting bound generalizes the previously studied cases and is always at least as tight as a bound which considers minimizing only the target error or an equal weighting of source and target errors. View details
    Learning Bounds for Domain Adaptation
    Koby Crammer
    Alex Kulesza
    Jennifer Wortman
    Advances in Neural Information Processing Systems 20, {MIT} Press, Cambridge, MA(2008)
    Preview
    Intelligent Email: Reply and Attachment Prediction
    Mark Dredze
    Tova Brooks
    Josh Carroll
    Joshua Magarick
    Proceedings of the 2008 International Conference on Intelligent User Interfaces
    Preview
    Frustratingly Hard Domain Adaptation for Dependency Parsing
    Mark Dredze
    Partha Pratim Talukdar
    João V. Graça
    Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 1051-1055
    Preview
    Intelligent Email: Aiding Users with {AI}
    Mark Dredze
    Hanna Wallach
    Danny Puller
    Tova Brooks
    Josh Carroll
    Joshua Magarick
    American National Conference on Artificial Intelligence (AAAI)(2008)
    Multi-View Learning over Structured and Non-Identical Outputs
    Joao Graca
    Ben Taskar
    Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI), AUAI Press(2008), pp. 204-211
    Preview abstract In many machine learning problems, labeled training data is limited but unlabeled data is ample. Some of these problems have instances that can be factored into multiple views, each of which is nearly sufficent in determining the correct labels. In this paper we present a new algorithm for probabilistic multi-view learning which uses the idea of stochastic agreement between views as regularization. Our algorithm works on structured and unstructured problems and easily generalizes to partial agreement scenarios. For the full agreement case, our algorithm minimizes the Bhattacharyya distance between the models of each view, and performs better than CoBoosting and two-view Perceptron on several flat and structured classification problems. View details
    Analysis of Representations for Domain Adaptation
    Shai Ben-David
    Koby Crammer
    Advances in Neural Information Processing Systems 20, MIT Press, Cambridge, MA(2007)
    Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification
    Mark Dredze
    Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Association for Computational Linguistics, Prague, Czech Republic(2007), pp. 440-447
    Domain Adaptation with Structural Correspondence Learning
    Ryan McDonald
    EMNLP 2006: 2006 Conference on Empirical Methods in Natural Language Processing, pp. 120-128
    "Sorry I forgot the attachment": Email Attachment Prediction
    Mark Dredze
    3rd Conference on Email and Anti-Spam, Stanford, CA(2006)
    Reply Expectation Prediction for Email Management
    Mark Dredze
    2nd Conference on Email and Anti-Spam, Stanford, CA(2005)
    Reply Expectation Prediction for Email Management
    Mark Dredze
    CEAS(2005)
    Distributed Latent Variable Models of Lexical Co-occurrences
    Amir Globerson
    Tenth International Workshop on Artificial Intelligence and Statistics(2005)
    Hierarchical Distributed Representations for Statistical Language Modeling
    Kilian Weinberger
    Lawrence Saul
    Advances in Neural Information Processing Systems 17, MIT Press, Cambridge, MA(2004)
    Hierarchical Distributed Representations for Statistical Language Modeling
    Kilian Q. Weinberger
    Lawrence K. Saul
    NIPS(2004)