Jump to Content
Fernando Pereira

Fernando Pereira

Fernando Pereira is VP and Engineering Fellow at Google, where he leads research and development in natural language understanding and machine learning. His previous positions include chair of the Computer and Information Science department of the University of Pennsylvania, head of the Machine Learning and Information Retrieval department at AT&T Labs, and research and management positions at SRI International. He received a Ph.D. in Artificial Intelligence from the University of Edinburgh in 1982, and has over 120 research publications on computational linguistics, machine learning, bioinformatics, speech recognition, and logic programming, as well as several patents. He was elected AAAI Fellow in 1991 for contributions to computational linguistics and logic programming, ACM Fellow in 2010 for contributions to machine learning models of natural language and biological sequences, and ACL Fellow for contributions to sequence modeling, finite-state methods, and dependency and deductive parsing. He was president of the Association for Computational Linguistics in 1993.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Conversational Music Retrieval with Synthetic Data
    Megan Eileen Leszczynski
    Ravi Ganti
    Shu Zhang
    Arun Tejasvi Chaganty
    Second Workshop on Interactive Learning for Natural Language Processing at NeurIPS 2022
    Preview abstract Users looking for recommendations often wish to improve suggestions through broad natural language feedback (e.g., “How about something more upbeat?”). However, building such conversational retrieval systems requires conversational data with rich user utterances paired with slates of items that cover a diverse range of preferences. This is challenging to collect scalably using conventional methods like crowd-sourcing. We address this problem with a new technique to synthesize high-quality dialog data by transforming the domain expertise encoded in curated item collections into corresponding item-seeking conversations. The method first generates a sequence of hypothetical slates returned by a system, and then uses a language model to introduce corresponding user utterances. We apply the approach on a dataset of curated music playlists to generate 10k diverse music-seeking conversations. A qualitative human evaluation shows that a majority of these conversations express believable sequences of slates and include user utterances that faithfully express preferences for them. When used to train a conversational retrieval model, the synthetic data yields up to a 23% relative gain on standard retrieval metrics compared to baselines trained on non-conversational and conversational datasets. View details
    Points, Paths, and Playscapes: Large-scale Spatial Language Understanding Tasks Set in the Real World
    Daphne Luong
    Proceedings of the First International Workshop on Spatial Language Understanding, Association for Computational Linguistics, New Orleans, Louisiana, USA (2018), pp. 46-52
    Preview abstract Spatial language understanding is important for practical applications and as a building block for better abstract language understanding. Much progress has been made through work on understanding spatial relations and values in images and texts as well as on giving and following navigation instructions in restricted domains. We argue that the next big advances in spatial language understanding can be best supported by creating large-scale datasets that focus on points and paths based in the real world, and then extending these to create online, persistent playscapes that mix human and bot players. The bot players can begin play having undergone a prior training regime, but then must learn, evolve, and survive according to their depth of understanding of scenes, navigation, and interactions. View details
    Preview abstract We describe SLING, a framework for parsing natural language into semantic frames. SLING supports general transition-based, neural-network parsing with bidirectional LSTM input encoding and a Transition Based Recurrent Unit (TBRU) for output decoding. The parsing model is trained end-to-end using only the text tokens as input. The transition system has been designed to output frame graphs directly without any intervening symbolic representation. The SLING framework includes an efficient and scalable frame store implementation as well as a neural network JIT compiler for fast inference during parsing. SLING is implemented in C++ and it is available for download on GitHub. View details
    Preview abstract Entity resolution is the task of linking each mention of an entity in text to the corresponding record in a knowledge base (KB). Coherence models for entity resolution encourage all referring expressions in a document to resolve to entities that are related in the KB. We explore attention-like mechanisms for coherence, where the evidence for each candidate is based on a small set of strong relations, rather than relations to all other entities in the document. The rationale is that document-wide support may simply not exist for non-salient entities, or entities not densely connected in the KB. Our proposed system outperforms state-of-the-art systems on the CoNLL 2003, TAC KBP 2010, 2011 and 2012 tasks. View details
    Preview abstract We describe Sparse Non-negative Matrix (SNM) language model estimation using multinomial loss on held-out data. Being able to train on held-out data is important in practical situations where the training data is usually mismatched from the held-out/test data. It is also less constrained than the previous training algorithm using leave-one-out on training data: it allows the use of richer meta-features in the adjustment model, e.g. the diversity counts used by Kneser-Ney smoothing which would be difficult to deal with correctly in leave-one-out training. In experiments on the one billion words language modeling benchmark, we are able to slightly improve on our previous results which use a different loss function, and employ leave-one-out training on a subset of the main training set. Surprisingly, an adjustment model with meta-features that discard all lexical information can perform as well as lexicalized meta-features. We find that fairly small amounts of held-out data (on the order of 30-70 thousand words) are sufficient for training the adjustment model. In a real-life scenario where the training data is a mix of data sources that are imbalanced in size, and of different degrees of relevance to the held-out and test data, taking into account the data source for a given skip-/n-gram feature and combining them for best performance on held-out/test data improves over skip-/n-gram SNM models trained on pooled data by about 8% in the SMT setup, or as much as 15% in the ASR/IME setup. The ability to mix various data sources based on how relevant they are to a mismatched held-out set is probably the most attractive feature of the new estimation method for SNM LM. View details
    Plato: A Selective Context Model for Entity Resolution
    Michael Ringgaard
    Transactions of the Association for Computational Linguistics, vol. 3 (2015), pp. 503-515
    Preview abstract We present Plato, a probabilistic model for entity resolution that includes a novel approach for handling noisy or uninformative features,and supplements labeled training data derived from Wikipedia with a very large unlabeled text corpus. Training and inference in the proposed model can easily be distributed across many servers, allowing it to scale to over 10^7 entities. We evaluate Plato on three standard datasets for entity resolution. Our approach achieves the best results to-date on TAC KBP 2011 and is highly competitive on both the CoNLL 2003 and TAC KBP 2012 datasets. View details
    Yedalog: Exploring Knowledge at Scale
    Brian Chin
    Vuk Ercegovac
    Peter Hawkins
    Mark S. Miller
    Franz Och
    Chris Olston
    1st Summit on Advances in Programming Languages (SNAPL 2015), Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, pp. 63-78
    Preview abstract With huge progress on data processing frameworks, human programmers are frequently the bottleneck when analyzing large repositories of data. We introduce Yedalog, a declarative programming language that allows programmers to mix data-parallel pipelines and computation seamlessly in a single language. By contrast, most existing tools for data-parallel computation embed a sublanguage of data-parallel pipelines in a general-purpose language, or vice versa. Yedalog extends Datalog, incorporating not only computational features from logic programming, but also features for working with data structured as nested records. Yedalog programs can run both on a single machine, and distributed across a cluster in batch and interactive modes, allowing programmers to mix different modes of execution easily. View details
    Large Scale Distributed Acoustic Modeling With Back-off N-grams
    Peng Xu
    Thomas Richardson
    IEEE Transactions on Audio, Speech and Language Processing, vol. 21 (2013), pp. 1158-1169
    Preview abstract The paper revives an older approach to acoustic modeling that borrows from n-gram language modeling in an attempt to scale up both the amount of training data and model size (as measured by the number of parameters in the model), to approximately 100 times larger than current sizes used in automatic speech recognition. In such a data-rich setting, we can expand the phonetic context significantly beyond triphones, as well as increase the number of Gaussian mixture components for the context-dependent states that allow it. We have experimented with contexts that span seven or more context-independent phones, and up to 620 mixture components per state. Dealing with unseen phonetic contexts is accomplished using the familiar back-off technique used in language modeling due to implementation simplicity. The back-off acoustic model is estimated, stored and served using MapReduce distributed computing infrastructure. Speech recognition experiments are carried out in an N-best list rescoring framework for Google Voice Search. Training big models on large amounts of data proves to be an effective way to increase the accuracy of a state-of-the-art automatic speech recognition system. We use 87,000 hours of training data (speech along with transcription) obtained by filtering utterances in Voice Search logs on automatic speech recognition confidence. Models ranging in size between 20--40 million Gaussians are estimated using maximum likelihood training. They achieve relative reductions in word-error-rate of 11% and 6% when combined with first-pass models trained using maximum likelihood, and boosted maximum mutual information, respectively. Increasing the context size beyond five phones (quinphones) does not help. View details
    Preview abstract Google Voice Search is an application that provides a data-rich setup for both language and acoustic modeling research. The approach we take revives an older approach to acoustic modeling that borrows from n-gram language modeling in an attempt to scale up both the amount of training data, and the model size (as measured by the number of parameters in the model), to approximately 100 times larger than current sizes used in automatic speech recognition. Speech recognition experiments are carried out in an N-best list rescoring framework for Google Voice Search. We use 87,000 hours of training data (speech along with transcription) obtained by filtering utterances in Voice Search logs on automatic speech recognition confidence. Models ranging in size between 20--40 million Gaussians are estimated using maximum likelihood training. They achieve relative reductions in word-error-rate of 11% and 6% when combined with first-pass models trained using maximum likelihood, and boosted maximum mutual information, respectively. Increasing the context size beyond five phones (quinphones) does not help. View details
    Distributed Acoustic Modeling with Back-off N-grams
    Peng Xu
    Thomas Richardson
    Proceedings of ICASSP 2012, IEEE, pp. 4129-4132
    Preview abstract The paper proposes an approach to acoustic modeling that borrows from n-gram language modeling in an attempt to scale up both the amount of training data and model size (as measured by the number of parameters in the model) to approximately 100 times larger than current sizes used in ASR. Dealing with unseen phonetic contexts is accomplished using the familiar back-off technique used in language modeling due to implementation simplicity. The new acoustic model is estimated and stored using the MapReduce distributed computing infrastructure. Speech recognition experiments are carried out in an Nbest rescoring framework for Google Voice Search. 87,000 hours of training data is obtained in an unsupervised fashion by filtering utterances in Voice Search logs on ASR confidence. The resulting models are trained using maximum likelihood and contain 20-40 million Gaussians. They achieve relative reductions in WER of 11% and 6% over first-pass models trained using maximum likelihood, and boosted MMI, respectively. View details
    Controlling Complexity in Part-of-Speech Induction
    Joao Graca
    Luisa Coheur
    Ben Taskar
    Journal of Artificial Intelligence Research (JAIR), vol. 41 (2011), pp. 527-551
    Preview abstract We consider the problem of fully unsupervised learning of grammatical (part-of-speech) categories from unlabeled text. The standard maximum-likelihood hidden Markov model for this task performs poorly, because of its weak inductive bias and large model capacity. We address this problem by refining the model and modifying the learning objective to control its capacity via para- metric and non-parametric constraints. Our approach enforces word-category association sparsity, adds morphological and orthographic features, and eliminates hard-to-estimate parameters for rare words. We develop an efficient learning algorithm that is not much more computationally intensive than standard training. We also provide an open-source implementation of the algorithm. Our experiments on five diverse languages (Bulgarian, Danish, English, Portuguese, Spanish) achieve significant improvements compared with previous methods for the same task. View details
    Preview abstract Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses parallelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. To evaluate these ideas, we constructed a labeled corpus of 1:5 million disambiguated mentions in Web pages by selecting link anchors referring to Wikipedia entities. We show that the combination of the hierarchical model with distributed inference quickly obtains high accuracy (with error reduction of 38%) on this large dataset, demonstrating the scalability of our approach. View details
    Posterior Sparsity in Dependency Grammar Induction
    Jennifer Gillenwater
    Joao Graca
    Ben Taskar
    Journal of Machine Learning Research, vol. 12 (2011), pp. 455-490
    Preview abstract A strong inductive bias is essential in unsupervised grammar induction. In this paper, we explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. We use part-of-speech (POS) tags to group dependencies by parent-child types and investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In experiments with 12 different languages, we achieve significant gains in directed attachment accuracy over the standard expectation maximization (EM) baseline, with an average accuracy improvement of 6.5%, outperforming EM by at least 1% for 9 out of 12 languages. Furthermore, the new method outperforms models based on standard Bayesian sparsity-inducing parameter priors with an average improvement of 5% and positive gains of at least 1% for 9 out of 12 languages. On English text in particular, we show that our approach improves performance over other state-of-the-art techniques. View details
    Exploiting Feature Covariance in High-Dimensional Online Learning
    Justin Ma
    Alex Kulesza
    Mark Dredze
    Koby Crammer
    Lawrence Saul
    Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, JMLR (2010), pp. 493-500
    Preview
    Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models
    Proceedings of the 2010 Conference on Empirical Methods on Natural Language Processing (EMNLP '10)
    Preview
    Sparsity in Dependency Grammar Induction
    Jennifer Gillenwater
    João Graça
    Ben Taskar
    48th Annual Meeting of the Association for Computational Linguistics (ACL 2010)
    Preview
    A theory of learning from different domains
    Shai Ben-David
    Koby Crammer
    Alex Kulesza
    Jennifer Vaughan
    Machine Learning, vol. 79 (2010), pp. 151-175
    Preview abstract Discriminative learning methods for classification perform well when training and test data are drawn from the same distribution. Often, however, we have plentiful labeled training data from a source domain but wish to learn a classifier which performs well on a target domain with a different distribution and little or no labeled training data. In this work we investigate two questions. First, under what conditions can a classifier trained from source data be expected to perform well on target data? Second, given a small amount of labeled target data, how should we combine it during training with the large amount of labeled source data to achieve the lowest target error at test time? We address the first question by bounding a classifier's target error in terms of its source error and the divergence between the two domains. We give a classifier-induced divergence measure that can be estimated from finite, unlabeled samples from the domains. Under the assumption that there exists some hypothesis that performs well in both domains, we show that this quantity together with the empirical source error characterize the target error of a source-trained classifier. We answer the second question by bounding the target error of a model which minimizes a convex combination of the empirical source and target errors. Previous theoretical work has considered minimizing just the source error, just the target error, or weighting instances from the two domains equally. We show how to choose the optimal combination of source and target error as a function of the divergence, the sample sizes of both domains, and the complexity of the hypothesis class. The resulting bound generalizes the previously studied cases and is always at least as tight as a bound which considers minimizing only the target error or an equal weighting of source and target errors. View details
    Automatically incorporating new sources in keyword search-based data integration
    Partha Pratim Talukdar
    Zachary G. Ives
    SIGMOD Conference, ACM Press (2010), pp. 387-398
    Preview
    Distributed MAP Inference for Undirected Graphical Models
    Sameer Singh
    Andrew McCallum
    Workshop on Learning on Cores, Clusters and Clouds (LCCC), Neural Information Processing Society (NIPS) (2010)
    Preview
    Gaussian Margin Machines
    Koby Crammer
    Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS 2009), Clearwater Beach, Florida, pp. 105-112
    Preview
    The Unreasonable Effectiveness of Data
    Alon Halevy
    IEEE Intelligent Systems, vol. 24 (2009), pp. 8-12
    Preview
    Posterior vs. Parameter Sparsity in Latent Variable Models
    Joao Graca
    Ben Taskar
    Advances in Neural Information Processing Systems 22 (2009), pp. 664-672
    Preview abstract In this paper we explore the problem of biasing unsupervised models to favor sparsity. We extend the posterior regularization framework [8] to encourage the model to achieve posterior sparsity on the unlabeled training data. We apply this new method to learn first-order HMMs for unsupervised part-of-speech (POS) tagging, and show that HMMs learned this way consistently and significantly out-performs both EM-trained HMMs, and HMMs with a sparsity-inducing Dirichlet prior trained by variational EM. We evaluate these HMMs on three languages — English, Bulgarian and Portuguese — under four conditions. We find that our method always improves performance with respect to both baselines, while variational Bayes actually degrades performance in most cases. We increase accuracy with respect to EM by 2.5%-8.7% absolute and we see improvements even in a semisupervised condition where a limited dictionary is provided. View details
    A transcription factor affinity-based code for mammalian transcription initiation
    M Megraw
    ST Jensen
    U Ohler
    AG Hatzigeorgiou
    Genome Research, vol. 19 (2009), pp. 644-56
    Preview abstract The recent arrival of large-scale cap analysis of gene expression (CAGE) data sets in mammals provides a wealth of quantitative information on coding and noncoding RNA polymerase II transcription start sites (TSS). Genome-wide CAGE studies reveal that a large fraction of TSS exhibit peaks where the vast majority of associated tags map to a particular location ( approximately 45%), whereas other active regions contain a broader distribution of initiation events. The presence of a strong single peak suggests that transcription at these locations may be mediated by position-specific sequence features. We therefore propose a new model for single-peaked TSS based solely on known transcription factors (TFs) and their respective regions of positional enrichment. This probabilistic model leads to near-perfect classification results in cross-validation (auROC = 0.98), and performance in genomic scans demonstrates that TSS prediction with both high accuracy and spatial resolution is achievable for a specific but large subgroup of mammalian promoters. The interpretable model structure suggests a DNA code in which canonical sequence features such as TATA-box, Initiator, and GC content do play a significant role, but many additional TFs show distinct spatial biases with respect to TSS location and are important contributors to the accurate prediction of single-peak transcription initiation sites. The model structure also reveals that CAGE tag clusters distal from annotated gene starts have distinct characteristics compared to those close to gene 5'-ends. Using this high-resolution single-peak model, we predict TSS for approximately 70% of mammalian microRNAs based on currently available data. View details
    Group Sparse Coding
    Samy Bengio
    Yoram Singer
    Dennis Strelow
    Advances in Neural Information Processing Systems (2009)
    Preview
    Weakly-Supervised Acquisition of Labeled Class Instances using Graph Random Walks
    Joseph Reisinger
    Rahul Bhagat
    Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2008), Association for Computational Linguistics, Honolulu, Hawaii, pp. 582-590
    Preview
    Confidence-Weighted Linear Classification
    Mark Dredze
    Koby Crammer
    International Conference on Machine Learning (ICML) (2008)
    Preview abstract We introduce confidence-weighted linear classifiers, which add parameter confidence information to linear classifiers. Online learners in this setting update both classifier parameters and the estimate of their confidence. The particular online algorithms we study here maintain a Gaussian distribution over parameter vectors and update the mean and covariance of the distribution with each instance. Empirical evaluation on a range of NLP tasks show that our algorithm improves over other state of the art online and batch methods, learns faster in the online setting, and lends itself to better classifier combination after parallel training. View details
    Generating Summary Keywords for Emails Using Topics
    Mark Dredze
    Hanna Wallach
    Danny Puller
    Proceedings of the 2008 International Conference on Intelligent User Interfaces
    Preview
    Learning Bounds for Domain Adaptation
    Koby Crammer
    Alex Kulesza
    Jennifer Wortman
    Advances in Neural Information Processing Systems 20, {MIT} Press, Cambridge, MA (2008)
    Preview
    Structured Learning with Approximate Inference
    Alex Kulesza
    Advances in Neural Information Processing Systems 20, {MIT} Press, Cambridge, MA (2008)
    Preview
    Speech Recognition with Weighted Finite-State Transducers
    Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2008)
    Preview
    Reading the Markets: Forecasting Public Opinion of Political Candidates by News Analysis
    Kevin Lerman
    Ari Gilder
    Mark Dredze
    Conference on Computational Linguistics (Coling) (2008)
    Preview abstract Media reporting shapes public opinion which can in turn influence events, particularly in political elections, in which candidates both respond to and shape public perception of their campaigns. We use computational linguistics to automatically predict the impact of news on public perception of political candidates. Our system uses daily newspaper articles to predict shifts in public opinion as reflected in prediction markets. We discuss various types of features designed for this problem. The news system improves market prediction over baseline market systems. View details
    Intelligent Email: Reply and Attachment Prediction
    Mark Dredze
    Tova Brooks
    Josh Carroll
    Joshua Magarick
    Proceedings of the 2008 International Conference on Intelligent User Interfaces
    Preview
    Euclidean Embedding of Co-occurrence Data
    Gal Chechik
    Naftali Tishby
    Journal of Machine Learning Research, vol. 8 (2007), pp. 2265-2295
    Preview
    Speech Recognition with Weighted Finite-State Transducers
    Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2007)
    Preview
    Frustratingly Hard Domain Adaptation for Dependency Parsing
    Mark Dredze
    João V. Graça
    Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 1051-1055
    Preview
    The Need for Open Source Software in Machine Learning
    Soren Sonnenburg
    Mikio L. Braun
    Cheng Soon Ong
    Samy Bengio
    Leon Bottou
    Geoff Holmes
    Yann LeCun
    Klaus-Robert Mueller
    Carl-Edward Rasmussen
    Gunnar Raetsch
    Bernhard Schoelkopf
    Alexander Smola
    Pascal Vincent
    Jason Weston
    Robert C. Williamson
    Journal of Machine Learning Research, vol. 8 (2007), pp. 2443-2466
    Preview abstract Open source tools have recently reached a level of maturity which makes them suitable for building large-scale real-world systems. At the same time, the field of machine learning has developed a large body of powerful learning algorithms for diverse applications. However, the true potential of these methods is not utilized, since existing implementations are not openly shared, resulting in software with low usability, and weak interoperability. We argue that this situation can be significantly improved by increasing incentives for researchers to publish their software under an open source model. Additionally, we outline the problems authors are faced with when trying to publish algorithmic implementations of machine learning methods. We believe that a resource of peer reviewed software accompanied by short articles would be highly valuable to both the machine learning and the general scientific community. View details
    Reranking candidate gene models with cross-species comparison for improved gene prediction
    Qian Liu
    Koby Crammer
    David S. Roos
    BMC Bioinformatics, vol. 9 (2008), pp. 433
    Intelligent Email: Aiding Users with AI
    Mark Dredze
    Hanna Wallach
    Danny Puller
    Tova Brooks
    Josh Carroll
    Joshua Magarick
    American National Conference on Artificial Intelligence (AAAI) (2008)
    Confidence-weighted linear classification
    M. Dredze
    K. Crammer
    Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008), Omnipress, pp. 264-271
    Evigan: a hidden variable model for integrating gene evidence for eukaryotic gene prediction
    Qian Liu
    Aaron J Mackey
    David S Roos
    Bioinformatics, vol. 24 (2008), pp. 597-605
    Learning to Create Data-Integrating Queries
    Marie Jacob
    M. Salman Mehmood
    Koby Crammer
    Zachary Ives
    Sudipto Guha
    VLDB (2008)
    A rate-distortion one-class model and its applications to clustering
    K. Crammer
    P. Talukdar
    Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008), Omnipress, pp. 184-191
    Learning to join everything
    CIKM (2007), pp. 9-10
    Semi-Automated Named Entity Annotation
    Mark Mandel
    Steven Carroll
    Peter White
    Proceedings of the Linguistic Annotation Workshop, Association for Computational Linguistics (2007), pp. 53-56
    Transductive structured classification through constrained min-cuts
    Proceedings of the Second Workshop on TextGraphs: Graph-Based Algorithms for Natural Language Processing, Association for Computational Linguistics (2007), pp. 37-44
    Penn/UMass/CHOP Biocreative II systems
    Koby Crammer
    Gideon Mann
    Kedar Bellare
    Andrew McCallum
    Steven Carroll
    Yang Jin
    Peter White
    Proceedings of the Second BioCreative Challenge Evaluation Workshop (2007), pp. 119-124
    Global Discriminative Learning for Higher-Accuracy Computational Gene Prediction
    Axel Bernal
    Koby Crammer
    Artemis Hatzigeorgiou
    PLoS Computational Biology, vol. 3 (2007)
    Analysis of Representations for Domain Adaptation
    Shai Ben-David
    Koby Crammer
    Advances in Neural Information Processing Systems 20, MIT Press, Cambridge, MA (2007)
    Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification
    Mark Dredze
    Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Association for Computational Linguistics, Prague, Czech Republic (2007), pp. 440-447
    Multilingual Dependency Parsing with a Two-Stage Discriminative Parser
    Ryan McDonald
    Kevin Lerman
    Tenth Conference on Computational Natural Language Learning (CoNLL-X) (2006)
    Embedding Heterogeneous Data Using Statistical Models
    Amir Globerson
    Gal Chechik
    Naftali Tishby
    AAAI (2006)
    Automated recognition of malignancy mentions in biomedical literature
    Yang Jin
    Ryan T. McDonald
    Kevin Lerman
    Mark A. Mandel
    Steven Carroll
    Mark Y. Liberman
    Raymond S. Winters
    Peter S. White
    BMC Bioinformatics, vol. 7 (2006), pp. 492
    Online Learning of Approximate Dependency Parsing Algorithms
    Ryan McDonald
    11th Conference of the European Chapter of the Association for Computational Linguistics: EACL 2006, pp. 81-88
    "Sorry I forgot the attachment": Email Attachment Prediction
    Mark Dredze
    3rd Conference on Email and Anti-Spam, Stanford, CA (2006)
    An automated procedure to identify biomedical articles that contain cancer-associated gene variants
    Ryan McDonald
    R Scott Winters
    Claire K Ankuda
    Joan A Murphy
    Amy E Rogers
    Marc S Greenblatt
    Peter S White
    Human Mutation, vol. 27 (2006), pp. 957-64
    Domain Adaptation with Structural Correspondence Learning
    Ryan McDonald
    EMNLP 2006: 2006 Conference on Empirical Methods in Natural Language Processing, pp. 120-128
    Online Learning of Approximate Dependency Parsing Algorithms
    Ryan McDonald
    Proceedings of EACL (2006)
    Identifying gene and protein mentions in text using conditional random fields
    Ryan McDonald
    BMC Bioinformatics (2005)
    Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE
    Ryan McDonald
    Seth Kulick
    Scott Winters
    Yang Jin
    Pete White
    Proceedings of ACL (2005)
    A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance
    Andrew McCallum
    Kedar Bellare
    Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI 2005)
    Reply Expectation Prediction for Email Management
    Mark Dredze
    2nd Conference on Email and Anti-Spam, Stanford, CA (2005)
    Flexible Text Segmentation with Structured Multilabel Classification
    Ryan McDonald
    Koby Crammer
    Proceedings of HLT-EMNLP (2005)
    Online Large-Margin Training of Dependency Parsers
    Ryan McDonald
    Koby Crammer
    Proceedings of ACL (2005)
    Online Large-Margin Training of Dependency Parsers
    Ryan McDonald
    Koby Crammer
    43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005)
    Non-Projective Dependency Parsing using Spanning Tree Algorithms
    Ryan T. McDonald
    Kiril Ribarov
    Jan Hajic
    HLT/EMNLP (2005)
    Weighted Automata in Text and Speech Processing
    arXiv, vol. abs/cs/0503077 (2005)
    Non-Projective Dependency Parsing using Spanning Tree Algorithms
    Ryan McDonald
    Kiril Ribarov
    Jan Hajic
    Proceedings of HLT-EMNLP (2005)
    Distributed Latent Variable Models of Lexical Co-occurrences
    Tenth International Workshop on Artificial Intelligence and Statistics (2005)
    Automatically annotating documents with normalized gene lists
    Jeremiah Crim
    Ryan McDonald
    BMC Bioinformatics (2005)
    Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE
    Ryan McDonald
    Seth Kulick
    Scott Winters
    Yang Jin
    Pete White
    43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005)
    Reply Expectation Prediction for Email Management
    Mark Dredze
    CEAS (2005)
    Hierarchical Distributed Representations for Statistical Language Modeling
    Kilian Q. Weinberger
    Lawrence K. Saul
    NIPS (2004)
    Case-Factor Diagrams for Structured Probabilistic Modeling
    David McAllester
    Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (2004)
    ATDD: An Algorithmic Tool for Domain Discovery in Protein Sequences
    Sanjeev Khanna
    Li Li
    Algorithms in Bioinformatics, 4th International Workshop (WABI 2004), Springer, pp. 206-217
    An entity tagger for recognizing acquired genomic variations in cancer literature
    Ryan McDonald
    Scott Winters
    Mark Mandel
    Yang Jin
    Pete White
    Bioinformatics (2004)
    Hierarchical Distributed Representations for Statistical Language Modeling
    Kilian Weinberger
    Lawrence Saul
    Advances in Neural Information Processing Systems 17, MIT Press, Cambridge, MA (2004)
    Case-Factor Diagrams for Structured Probabilistic Modeling
    David A. McAllester
    UAI (2004), pp. 382-391
    Euclidean Embedding of Co-Occurrence Data
    Amir Globerson
    Gal Chechik
    Naftali Tishby
    Advances in Neural Information Processing Systems (NIPS), MIT press, Cambridge, MA (2004), pp. 497-504
    Shallow Parsing with Conditional Random Fields
    Fei Sha
    HLT-NAACL (2003)
    Weighted Finite-State Transducers in Speech Recognition
    Computer Speech and Language, vol. 16 (2002), pp. 69-88
    Weighted finite-state transducers in speech recognition
    Computer Speech & Language, vol. 16 (2002), pp. 69-88
    Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
    John Lafferty
    Andrew McCallum
    Proceedings of ICML-01 (2001), pp. 282-289
    Maximum Entropy Markov Models for Information Extraction and Segmentation
    Andrew McCallum
    Dayne Freitag
    Machine Learning: Proceedings of the Seventeenth International Conference (ICML 2000), Stanford, California, pp. 591-598
    The Design Principles of a Weighted Finite-State Transducer Library
    Theoretical Computer Science, vol. 231 (2000), pp. 17-32
    Machine Learning for Efficient Natural-Language Processing
    CPM (2000), pp. 11
    The information bottleneck method
    Naftali Tishby
    William Bialek
    arXiv, vol. physics/0004057 (2000)
    Weighted Finite-State Transducers in Speech Recognition
    Proceedings of the ISCA Tutorial and Research Workshop, Automatic Speech Recognition: Challenges for the new Millenium (ASR2000), Paris, France
    Formal Grammar and Information Theory: Together Again?
    Philosophical Transactions of the Royal Society, vol. 358 (2000), pp. 1239-1253
    AT&T at TREC-8
    Amit Singhal
    Steven P. Abney
    Donald Hindle
    TREC (1999)
    SCAN: Designing and Evaluating User Interfaces to Support Retrieval From Speech Archives
    Steve Whittaker
    Julia Hirschberg
    John Choi
    Donald Hindle
    Amit Singhal
    SIGIR (1999), pp. 26-33
    Similarity-Based Models of Word Cooccurrence Probabilities
    Ido Dagan
    Lillian Lee
    Machine Learning, vol. 34 (1999), pp. 43-69
    Relating Probabilistic Grammars and Automata
    Steven Abney
    David McAllester
    37th Annual Meeting of the Association for Computational Linguistics, Morgan Kaufmann, San Francisco, California (1999), pp. 542-549
    Distributional Similarity Models: Clustering vs.~Nearest Neighbors
    Lillian Lee
    37th Annual Meeting of the Association for Computational Linguistics, Morgan Kaufmann, San Francisco, California (1999), pp. 33-40
    Quantifiers, Anaphora, and Intensionality
    Mary Dalrymple
    John Lamping
    Vijay Saraswat
    Semantics and Syntax in Lexical Functional Grammar, MIT Press, Cambridge, Massachusetts (1999), pp. 39-89
    Declarative Programming for a Messy World
    ICLP (1999), pp. 3-5
    Document Expansion for Speech Retrieval
    Amit Singhal
    SIGIR (1999), pp. 34-41
    Multimedia Standards: Present and Future
    ICMCS, Vol. 1 (1999), pp. 145-146
    The Information Bottleneck Method
    Naftali Z. Tishby
    William Bialek
    Proceedings of the 37th Allerton Conference on Communication, Control and Computing, Urbana, Illinois (1999)
    An Efficient Extension to Mixture Techniques for Prediction and Decision Trees
    Yoram Singer
    Machine Learning, vol. 36 (1999), pp. 183-199
    Finding Information in Audio: A New Paradigm for Audio Browsing and Retrieval
    Julia Hirschberg
    Steve Whittaker
    Don Hindle
    Amit Singhal
    Accessing Information in Spoken Audio: Proceedings of the ESCA ETRW Workshop, Cambridge, England (1999), pp. 117-122
    Relating Probabilistic Grammars and Automata
    Steven P. Abney
    David A. McAllester
    ACL (1999)
    Efficient General Lattice Generation and Rescoring
    Andrej Ljolje
    EUROSPEECH 99 (1999), pp. 1251-1254
    AT&T at TREC-7
    Amit Singhal
    John Choi
    Donald Hindle
    David D. Lewis
    TREC (1998), pp. 186-198
    Modelling Divergent Production: A multi-domain approach
    ECAI (1998), pp. 131-132
    A Rational Design for a Weighted Finite-State Transducer Library
    Proceedings of the Second International Workshop on Implementing Automata (WIA '97), Springer-Verlag, Berlin-NY (1998), pp. 144-158
    Dynamic Compilation of Weighted Context-Free Grammars
    Proceedings of COLING-ACL '98, Montreal, Canada (1998), pp. 891-897
    SCAN - Speech Content Based Audio Navigator: A Systems Overview
    John Choi
    Don Hindle
    Julia Hirschberg
    Ivan Magrin-Chagnolleau
    Christine Nakatani
    Amit Singhal
    Steve Whittaker
    Proceedings of the Fifth International Conference on Spoken Language Processing, Sydney (1998)
    Dynamic Compilation of Weighted Context-Free Grammars
    36th Meeting of the Association for Computational Linguistics (ACL '98), Proceedings of the Conference, Montréal, Québec, Canada (1998), pp. 891-897
    Full Expansion of Context-Dependent Networks in Large Vocabulary Speech Recognition
    Don Hindle
    Andrej Ljolje
    Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '98), Seattle, Washington (1998)
    Similarity-Based Methods For Word Sense Disambiguation
    Ido Dagan
    Lillian Lee
    arXiv (1997)
    A Rational Design for a Weighted Finite-State Transducer Library
    Proceedings of the Workshop on Implementing Automata (WIA '97), London, Ontario, Canada, University of Western Ontario, London, Ontario, Canada (1997)
    Finite-State Approximation of Phrase-Structure Grammars
    Rebecca N. Wright
    Finite-State Language Processing, MIT Press, Cambridge, Massachusetts (1997), pp. 149-173
    Speech Recognition by Composition of Weighted Finite Automata
    Finite-State Language Processing, MIT Press, Cambridge, Massachusetts (1997), pp. 431-453
    A Rational Design for a Weighted Finite-State Transducer Library
    WIA'97: Proceedings of the Workshop on Implementing Automata, Springer-Verlag (1997)
    Aggregate and Mixed-Order Markov Models for Statistical Language Processing
    Lawrence Saul
    Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Somerset, NJ. Distributed by Morgan Kaufmann, San Francisco, CA (1997), pp. 81-89
    AT&T at TREC-6: SDR Track
    Amit Singhal
    John Choi
    Donald Hindle
    TREC (1997), pp. 227-232
    Transducer Composition for Context-Dependent Network Expansion
    Proceedings of the 5th European Conference on Speech Communication and Technology (Eurospeech '97), Rhodes, Greece (1997)
    Transducer Composition for Context-Dependent Network Expansion
    EuroSpeech'97, European Speech Communication Association, Genova, Italy (1997), pp. 1427-1430
    Quantifiers, Anaphora, and Intensionality
    Mary Dalrymple
    John Lamping
    Vijay A. Saraswat
    Journal of Logic, Language, and Information, vol. 6, no. 3 (1997), pp. 219-273
    Similarity-Based Methods For Word Sense Disambiguation
    Ido Dagan
    Lillian Lee
    35th Annual Meeting of the Association for Computational Linguistics, Morgan Kaufmann, San Francisco, California (1997), pp. 56-63
    Intensional Verbs Without Type-Raising or Lexical Ambiguity
    Mary Dalrymple
    John Lamping
    Vijay Saraswat
    Logic, Language and Computation (Volume 1), {CSLI} Publications, Stanford, California (1996), pp. 167-182
    Interactions of Scope and Ellipsis
    Stuart M. Shieber
    Mary Dalrymple
    Linguistics and Philosophy, vol. 19 (1996), pp. 527-552
    Language, Computation and Artificial Intelligence
    ACM Computing Surveys, vol. 28 (1996), pp. 9
    A Deductive Account of Quantification in LFG
    Mary Dalrymple
    John Lamping
    Vijay Saraswat
    Quantifiers, Deduction, and Context, {CSLI} Publications, Stanford, California (1996), pp. 33-57
    Rational Power Series in Text and Speech Processing
    Graduate course, University of Pennsylvania, Department of Computer Science, Philadelphia, PA (1996)
    Speech Recognition by Composition of Weighted Finite Automata
    Weighted Automata in Text and Speech Processing
    Proceedings of the 12th biennial European Conference on Artificial Intelligence (ECAI-96), Workshop on Extended finite state models of language, John Wiley and Sons, Chichester, Budapest, Hungary (1996)
    Beyond Word N-Grams
    Yoram Singer
    Naftali Z. Tishby
    Proceedings of the Third Workshop on Very Large Corpora, Association for Computational Linguistics, Columbus, Ohio (1995), pp. 95-106
    Principles and Implementation of Deductive Parsing
    Stuart M. Shieber
    Yves Schabes
    Journal of Logic Programming, vol. 24 (1995), pp. 3-36
    Linear Logic for Meaning Assembly
    Mary Dalrymple
    John Lamping
    Vijay A. Saraswat
    arXiv (1995)
    Ellipsis and Higher-Order Unification
    Mary Dalrymple
    Stuart M. Shieber
    arXiv (1995)
    The AT&T 60,000 Word Speech-to-Text System
    Andrej Ljolje
    Don Hindle
    Eurospeech'95: ESCA 4th European Conference on Speech Communication and Technology, Madrid, Spain (1995), pp. 207-210
    Design of a Linguistic Postprocessor using Variable Memory Length Markov Models
    Isabelle Guyon
    Proceedings of the Third International Conference on Document Analysis and Recognition, IEEE Computer Society Press, Los Alamitos, California (1995), pp. 454-457
    Similarity-Based Estimation of Word Cooccurrence Probabilities
    Ido Dagan
    Lillian Lee
    32nd Annual Meeting of the Association for Computational Linguistics, Morgan Kaufmann, San Francisco, California (1994), pp. 272-278
    Frequencies vs. Biases: Machine Learning Problems in Natural Language Processing - Abstract
    ICML (1994), pp. 380
    Weighted Rational Transductions and their Application to Human Language Processing
    Human Language Technology Workshop, Morgan Kaufmann, San Francisco, California (1994), pp. 262-267
    Frequencies vs Biases: Machine Learning Problems in Natural Language Processing (Extended Abstract)
    COLT (1994), pp. 12
    Introduction to Special Issue on Natural Language Processing
    Barbara J. Grosz
    Artificial Intelligence, vol. 63 (1993), pp. 1-15
    Distributional Clustering of English Words
    Naftali Z. Tishby
    Lillian Lee
    30th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Columbus, Ohio (1993), pp. 183-190
    Empirical Properties of Finite State Approximations for Phrase Structure Grammars
    David B. Roe
    Proceedings of the International Conference on Spoken Language Processing, Banff, Alberta (1992), pp. 261-264
    Inside-Outside Reestimation from Partially Bracketed Corpora
    Yves Schabes
    30th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Newark, Delaware (1992), pp. 128-135
    Quantifier Scoping
    Douglas B. Moran
    The Core Language Engine, MIT Press, Cambridge, Massachusetts (1992), pp. 149-172
    A spoken language translator for restricted-domain context-free languages
    David B. Roe
    Alejandro Macarrón
    Speech Communication, vol. 11 (1992), pp. 311-319
    Efficient Grammar Processing for a Spoken Language Translation System
    David B. Roe
    Alejandro Macarrón
    Proceedings of ICASSP, IEEE, San Francisco, California (1992), pp. 213-216
    Finite-State Approximation of Phrase-Structure Grammars
    Rebecca N. Wright
    29th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Berkeley, California (1991), pp. 246-255
    Toward a Spoken Language Translator for Restricted-Domain Context-Free Languages
    David B. Roe
    Alejandro Macarrón
    EUROSPEECH 91 -- 2nd European Conference on Speech Communication and Technology, Genova, Italy (1991), pp. 1063-1066
    Semantic Interpretation as Higher-Order Deduction
    Logics in AI: European Workshop JELIA'90, Springer-Verlag, Berlin, Germany, Amsterdam, Holland (1991), pp. 78-96
    Deductive Interpretation
    Natural Language and Speech, Springer-Verlag (1991), pp. 116-133
    Incremental Interpretation
    Martha E. Pollack
    Artificial Intelligence, vol. 50 (1991), pp. 37-82
    Ellipsis and Higher-Order Unification
    Mary Dalrymple
    Stuart M. Shieber
    Linguistics and Philosophy, vol. 14 (1991), pp. 399-452
    Categorial Semantics and Scoping
    Computational Linguistics, vol. 16 (1990), pp. 1-10
    Semantic-Head-Driven Generation
    Stuart M. Shieber
    Gertjan van Noord
    Robert C. Moore
    Computational Linguistics, vol. 16 (1990), pp. 30-42
    Finite-State Approximations of Grammars
    Proceedings of the Second Speech and Natural Language Workshop (1990), pp. 20-25
    Prolog and Natural-Language Analysis: into the Third Decade
    Logic Programming: Proceedings of the 1990 North American Conference, MIT Press, Cambridge, Massachusetts, Austin, Texas, pp. 813-832
    Integrating Speech and Natural Language Processing
    Robert C. Moore
    Hy Murveit
    First Speech and Natural Language Workshop (1989), pp. 243-247
    Synergistic Use of Direct Manipulation and Natural Language
    Phil R. Cohen
    Mary Dalrymple
    Douglas B. Moran
    J. W. Sullivan
    R. A. Gargan, Jr.
    J. L. Schlossberg
    S. W. Tyler
    Proceedings of CHI'89, Austin, Texas (1989)
    A Calculus for Semantic Composition and Scoping
    27th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, University of British Columbia, Vancouver, Canada (1989), pp. 152-160
    A Semantic-Head-Driven Generation Algorithm for Unification-Based Formalisms
    Stuart M. Shieber
    Gertjan van Noord
    Robert C. Moore
    ACL (1989), pp. 7-17
    A Semantic-Head-Driven Generation Algorithm for Unification-Based Formalisms
    Stuart M. Shieber
    Gertjan van Noord
    Robert C. Moore
    27th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, University of British Columbia, Vancouver, Canada (1989), pp. 7-17
    An Integrated Framework for Semantic and Pragmatic Interpretation
    Martha E. Pollack
    26th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Buffalo, New York (1988), pp. 75-86
    TEAM: An Experiment in the Design of Transportable Natural Language Interfaces
    Barbara J. Grosz
    Douglas E. Appelt
    Paul A. Martin
    Artificial Intelligence, vol. 32 (1987), pp. 173-243
    Grammars and Logics of Partial Information
    Logic Programming: Proceedings of the Fourth International Conference, MIT Press, Cambridge Massachusetts, Melbourne, Australia (1987), pp. 989-1013
    Prolog and Natural-Language Analysis
    Stuart M. Shieber
    Center for the Study of Language and Information, Stanford, California (1987)
    Can Drawing Be Liberated from the von Neumann Style
    Logic Programming and Its Applications, Ablex, Norwood, New Jersey (1986), pp. 175-187
    TEAM: An Experimental Transportable Natural-Language Interface
    Paul A. Martin
    Douglas E. Appelt
    Barbara J. Grosz
    FJCC (1986), pp. 260-267
    A Sheaf-Theoretic Model of Concurrency
    Luis F. Monteiro
    Symposium on Logic and Computer Science, IEEE Computer Society Press, Cambridge, Massachusetts (1986), pp. 66-76
    A Structure-Sharing Representation for Unification-Based Grammar Formalisms
    23rd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Chicago, Illinois (1985), pp. 137-144
    A New Characterization of Attachment Preferences
    Natural Language Parsing---Psychological, Computational and Theoretical perspectives, Cambridge University Press, Cambridge, England (1985), pp. 307-319
    An Overview of Automated Reasoning and Related Fields
    L. Wos
    Robert Hong
    Robert S. Boyer
    J Strother Moore
    W. W. Bledsoe
    L. J. Henschen
    Bruce G. Buchanan
    Graham Wrightson
    Cordell Green
    Journal of Automated Reasoning, vol. 1 (1985), pp. 5-48
    The Semantics of Grammar Formalisms Seen as Computer Languages
    Stuart M. Shieber
    Proceedings of COLING 84, Association for Computational Linguistics, Stanford, California (1984), pp. 123-129
    A Fact Dependency System for the Logic Programmer
    Peter S. G. Swinson
    Aart Bijl
    Computer-Aided Design, vol. 14 (1983), pp. 235-243
    Parsing as Deduction
    David H. D. Warren
    21st Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Cambridge, Massachusetts (1983), pp. 137-144
    Transportability and Generality in a Natural-Language Interface System
    Paul A. Martin
    Douglas E. Appelt
    Proceedings of the Eight International Joint Conference on Artificial Intelligence (1983), pp. 573-581
    Can Drawing Be Liberated From the Von Neumann Style?
    Databases for Business and Office Applications (1983), pp. 184-190
    An Efficient Easily Adaptable System for Interpreting Natural Language Queries
    David H. D. Warren
    Computational Linguistics, vol. 8 (1982), pp. 110-122
    Extraposition Grammars
    Computational Linguistics, vol. 7 (1981), pp. 243-256
    Definite Clause Grammars for Language Analysis---a Survey of the Formalism and a Comparison with Augmented Transition Networks
    David H. D. Warren
    Artificial Intelligence, vol. 13 (1980), pp. 231-278
    Prolog -- The Language and its Implementation Compared with Lisp
    David H. D. Warren
    Luis M. Pereira
    Proceedings of the Symposium on Artificial Intelligence and Programming Languages, Rochester, New York (1977), pp. 109-115