Fernando Pereira
Fernando Pereira is VP and Engineering Fellow at Google, where he leads research and development in natural language understanding and machine learning. His previous positions include chair of the Computer and Information Science department of the University of Pennsylvania, head of the Machine Learning and Information Retrieval department at AT&T Labs, and research and management positions at SRI International. He received a Ph.D. in Artificial Intelligence from the University of Edinburgh in 1982, and has over 120 research publications on computational linguistics, machine learning, bioinformatics, speech recognition, and logic programming, as well as several patents. He was elected AAAI Fellow in 1991 for contributions to computational linguistics and logic programming, ACM Fellow in 2010 for contributions to machine learning models of natural language and biological sequences, and ACL Fellow for contributions to sequence modeling, finite-state methods, and dependency and deductive parsing. He was president of the Association for Computational Linguistics in 1993.
Research Areas
Authored Publications
Google Publications
Other Publications
Sort By
Conversational Music Retrieval with Synthetic Data
Megan Eileen Leszczynski
Ravi Ganti
Shu Zhang
Arun Tejasvi Chaganty
Second Workshop on Interactive Learning for Natural Language Processing at NeurIPS 2022
Preview abstract
Users looking for recommendations often wish to improve suggestions through
broad natural language feedback (e.g., “How about something more upbeat?”).
However, building such conversational retrieval systems requires conversational
data with rich user utterances paired with slates of items that cover a diverse
range of preferences. This is challenging to collect scalably using conventional
methods like crowd-sourcing. We address this problem with a new technique to
synthesize high-quality dialog data by transforming the domain expertise encoded
in curated item collections into corresponding item-seeking conversations. The
method first generates a sequence of hypothetical slates returned by a system,
and then uses a language model to introduce corresponding user utterances. We
apply the approach on a dataset of curated music playlists to generate 10k diverse
music-seeking conversations. A qualitative human evaluation shows that a majority
of these conversations express believable sequences of slates and include user
utterances that faithfully express preferences for them. When used to train a
conversational retrieval model, the synthetic data yields up to a 23% relative gain
on standard retrieval metrics compared to baselines trained on non-conversational
and conversational datasets.
View details
Points, Paths, and Playscapes: Large-scale Spatial Language Understanding Tasks Set in the Real World
Daphne Luong
Proceedings of the First International Workshop on Spatial Language Understanding, Association for Computational Linguistics, New Orleans, Louisiana, USA (2018), pp. 46-52
Preview abstract
Spatial language understanding is important for practical applications and as a building block for better abstract language understanding. Much progress has been made through work on understanding spatial relations and values in images and texts as well as on giving and following navigation instructions in restricted domains. We argue that the next big advances in spatial language understanding can be best supported by creating large-scale datasets that focus on points and paths based in the real world, and then extending these to create online, persistent playscapes that mix human and bot players. The bot players can begin play having undergone a prior training regime, but then must learn, evolve, and survive according to their depth of understanding of scenes, navigation, and interactions.
View details
Preview abstract
We describe SLING, a framework for parsing natural language into semantic frames. SLING supports general transition-based, neural-network parsing with bidirectional LSTM input encoding and a Transition Based Recurrent Unit (TBRU) for output decoding. The parsing model is trained end-to-end using only the text tokens as input. The transition system has been designed to output frame graphs directly without any intervening symbolic representation. The SLING framework includes an efficient and scalable frame store implementation as well as a neural network JIT compiler for fast inference during parsing. SLING is implemented in C++ and it is available for download on GitHub.
View details
Collective Entity Resolution with Multi-Focal Attention
Soumen Chakrabarti
Michael Ringaard
ACL (2016)
Preview abstract
Entity resolution is the task of linking each mention of an entity in text to the corresponding record in a knowledge base (KB). Coherence models for entity resolution encourage all referring expressions in a document to resolve to entities that are related in the KB. We explore attention-like mechanisms for coherence, where the evidence for each candidate is based on a small set of strong relations, rather than relations to all other entities in the document. The rationale is that document-wide support may simply not exist for non-salient entities, or entities not densely connected in the KB. Our proposed system outperforms state-of-the-art systems on the CoNLL 2003, TAC KBP 2010, 2011
and 2012 tasks.
View details
Preview abstract
We describe Sparse Non-negative Matrix (SNM) language model estimation using multinomial loss on held-out data.
Being able to train on held-out data is important in practical situations where the training data is usually mismatched from the held-out/test data. It is also less constrained than the previous training algorithm using leave-one-out on training data: it allows the use of richer meta-features in the adjustment model, e.g. the diversity counts used by Kneser-Ney smoothing which would be difficult to deal with correctly in leave-one-out training.
In experiments on the one billion words language modeling benchmark, we are able to slightly improve on our previous results which use a different loss function, and employ leave-one-out training on a subset of the main training set. Surprisingly, an adjustment model with meta-features that discard all lexical information can perform as well as lexicalized meta-features. We find that fairly small amounts of held-out data (on the order of 30-70 thousand words) are sufficient for training the adjustment model.
In a real-life scenario where the training data is a mix of data sources that are imbalanced in size, and of different degrees of relevance to the held-out and test data, taking into account the data source for a given skip-/n-gram feature and combining them for best performance on held-out/test data improves over skip-/n-gram SNM models trained on pooled data by about 8% in the SMT setup, or as much as 15% in the ASR/IME setup.
The ability to mix various data sources based on how relevant they are to a mismatched held-out set is probably the most attractive feature of the new estimation method for SNM LM.
View details
Plato: A Selective Context Model for Entity Resolution
Michael Ringgaard
Transactions of the Association for Computational Linguistics, vol. 3 (2015), pp. 503-515
Preview abstract
We present Plato, a probabilistic model for entity resolution that includes a novel approach for handling noisy or uninformative features,and supplements labeled training data derived from Wikipedia with a very large unlabeled text corpus. Training and inference in the proposed model can easily be distributed across many servers, allowing it to scale to over 10^7 entities. We evaluate Plato on three standard datasets for entity resolution. Our approach achieves the best results to-date on TAC KBP 2011 and is highly competitive on both the CoNLL 2003 and TAC KBP 2012 datasets.
View details
Yedalog: Exploring Knowledge at Scale
Brian Chin
Vuk Ercegovac
Peter Hawkins
Mark S. Miller
Franz Och
Chris Olston
1st Summit on Advances in Programming Languages (SNAPL 2015), Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, pp. 63-78
Preview abstract
With huge progress on data processing frameworks, human programmers are frequently the bottleneck when analyzing large repositories of data. We introduce Yedalog, a declarative programming language that allows programmers to mix data-parallel pipelines and computation seamlessly in a single language. By contrast, most existing tools for data-parallel computation embed a sublanguage of data-parallel pipelines in a general-purpose language, or vice versa. Yedalog extends Datalog, incorporating not only computational features from logic programming, but also features for working with data structured as nested records. Yedalog programs can run both on a single machine, and distributed across a cluster in batch and interactive modes, allowing programmers to mix different modes of execution easily.
View details
Large Scale Distributed Acoustic Modeling With Back-off N-grams
Peng Xu
Thomas Richardson
IEEE Transactions on Audio, Speech and Language Processing, vol. 21 (2013), pp. 1158-1169
Preview abstract
The paper revives an older approach to acoustic modeling that borrows from n-gram language modeling in an attempt to scale up both the amount of training data and model size (as measured by the number of parameters in the model), to approximately 100 times larger than current sizes used in automatic speech recognition.
In such a data-rich setting, we can expand the phonetic context significantly beyond triphones, as well as increase the number of Gaussian mixture components for the context-dependent states that allow it. We have experimented with contexts that span seven or more context-independent phones, and up to 620 mixture components per state. Dealing with unseen phonetic contexts is accomplished using the familiar back-off technique used in language modeling due to implementation simplicity. The back-off acoustic model is estimated, stored and served using MapReduce distributed computing infrastructure.
Speech recognition experiments are carried out in an N-best list rescoring framework for Google Voice Search. Training big models on large amounts of data proves to be an effective way to increase the accuracy of a state-of-the-art automatic speech recognition system. We use 87,000 hours of training data (speech along with transcription) obtained by filtering utterances in Voice Search logs on automatic speech recognition confidence. Models ranging in size between 20--40 million Gaussians are estimated using maximum likelihood training. They achieve relative reductions in word-error-rate of 11% and 6% when combined with first-pass models trained using maximum likelihood, and boosted maximum mutual information, respectively. Increasing the context size beyond five phones (quinphones) does not help.
View details
Preview abstract
Google Voice Search is an application that provides a data-rich setup for both language and acoustic modeling research.
The approach we take revives an older approach to acoustic modeling that borrows from n-gram language modeling in an attempt to scale up both the amount of training data, and the model size (as measured by the number of parameters in the model), to approximately 100 times larger than current sizes used in automatic speech recognition.
Speech recognition experiments are carried out in an N-best list rescoring framework for Google Voice Search. We use 87,000 hours of training data (speech along with transcription) obtained by filtering utterances in Voice Search logs on automatic speech recognition confidence.
Models ranging in size between 20--40 million Gaussians are estimated using maximum likelihood training. They achieve relative reductions in word-error-rate of 11% and 6% when combined with first-pass models trained using maximum likelihood, and boosted maximum mutual information, respectively. Increasing the context size beyond five phones (quinphones) does not help.
View details
Distributed Acoustic Modeling with Back-off N-grams
Peng Xu
Thomas Richardson
Proceedings of ICASSP 2012, IEEE, pp. 4129-4132
Preview abstract
The paper proposes an approach to acoustic modeling that
borrows from n-gram language modeling in an attempt to
scale up both the amount of training data and model size (as
measured by the number of parameters in the model) to approximately 100 times larger than current sizes used in ASR. Dealing with unseen phonetic contexts is accomplished
using the familiar back-off technique used in language modeling due to implementation simplicity. The new acoustic
model is estimated and stored using the MapReduce distributed computing infrastructure. Speech recognition experiments are carried out in an Nbest rescoring framework for Google Voice Search. 87,000 hours of training data is obtained in an unsupervised fashion by filtering utterances in Voice Search logs on ASR confidence. The resulting models are trained using maximum likelihood and contain 20-40 million Gaussians. They achieve
relative reductions in WER of 11% and 6% over first-pass
models trained using maximum likelihood, and boosted MMI,
respectively.
View details
Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models
Sameer Singh
Andrew McCallum
Association for Computational Linguistics (ACL) (2011)
Preview abstract
Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly
impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses parallelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. To evaluate these ideas, we constructed a labeled corpus of 1:5 million disambiguated mentions in Web pages by selecting link anchors referring to Wikipedia entities. We show that the combination of the
hierarchical model with distributed inference quickly obtains high accuracy (with error reduction of 38%) on this large dataset, demonstrating the scalability of our approach.
View details
Controlling Complexity in Part-of-Speech Induction
Joao Graca
Luisa Coheur
Ben Taskar
Journal of Artificial Intelligence Research (JAIR), vol. 41 (2011), pp. 527-551
Preview abstract
We consider the problem of fully unsupervised learning of grammatical (part-of-speech) categories from unlabeled text. The standard maximum-likelihood hidden Markov model for this task performs poorly, because of its weak inductive bias and large model capacity. We address this problem by refining the model and modifying the learning objective to control its capacity via para- metric and non-parametric constraints. Our approach enforces word-category association sparsity, adds morphological and orthographic features, and eliminates hard-to-estimate parameters for rare words. We develop an efficient learning algorithm that is not much more computationally intensive than standard training. We also provide an open-source implementation of the algorithm. Our experiments on five diverse languages (Bulgarian, Danish, English, Portuguese, Spanish) achieve significant improvements compared with previous methods for the same task.
View details
Posterior Sparsity in Dependency Grammar Induction
Jennifer Gillenwater
Joao Graca
Ben Taskar
Journal of Machine Learning Research, vol. 12 (2011), pp. 455-490
Preview abstract
A strong inductive bias is essential in unsupervised grammar induction. In this paper, we explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. We use part-of-speech (POS) tags to group dependencies by parent-child types and investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In experiments with 12 different languages, we achieve significant gains in directed attachment accuracy over the standard expectation maximization (EM) baseline, with an average accuracy improvement of 6.5%, outperforming EM by at least 1% for 9 out of 12 languages. Furthermore, the new method outperforms models based on standard Bayesian sparsity-inducing parameter priors with an average improvement of 5% and positive gains of at least 1% for 9 out of 12 languages. On English text in particular, we show that our approach improves performance over other state-of-the-art techniques.
View details
Distributed MAP Inference for Undirected Graphical Models
Preview
Sameer Singh
Andrew McCallum
Workshop on Learning on Cores, Clusters and Clouds (LCCC), Neural Information Processing Society (NIPS) (2010)
Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models
Preview
Proceedings of the 2010 Conference on Empirical Methods on Natural Language Processing (EMNLP '10)
Sparsity in Dependency Grammar Induction
Preview
Jennifer Gillenwater
João Graça
Ben Taskar
48th Annual Meeting of the Association for Computational Linguistics (ACL 2010)
A theory of learning from different domains
Shai Ben-David
Koby Crammer
Alex Kulesza
Jennifer Vaughan
Machine Learning, vol. 79 (2010), pp. 151-175
Preview abstract
Discriminative learning methods for classification perform well when training and test data are drawn from the same distribution. Often, however, we have plentiful labeled training data from a source domain but wish to learn a classifier which performs well on a target domain with a different distribution and little or no labeled training data. In this work we investigate two questions. First, under what conditions can a classifier trained from source data be expected to perform well on target data? Second, given a small amount of labeled target data, how should we combine it during training with the large amount of labeled source data to achieve the lowest target error at test time? We address the first question by bounding a classifier's target error in terms of its source error and the divergence between the two domains. We give a classifier-induced divergence measure that can be estimated from finite, unlabeled samples from the domains. Under the assumption that there exists some hypothesis that performs well in both domains, we show that this quantity together with the empirical source error characterize the target error of a source-trained classifier. We answer the second question by bounding the target error of a model which minimizes a convex combination of the empirical source and target errors. Previous theoretical work has considered minimizing just the source error, just the target error, or weighting instances from the two domains equally. We show how to choose the optimal combination of source and target error as a function of the divergence, the sample sizes of both domains, and the complexity of the hypothesis class. The resulting bound generalizes the previously studied cases and is always at least as tight as a bound which considers minimizing only the target error or an equal weighting of source and target errors.
View details
Automatically incorporating new sources in keyword search-based data integration
Preview
Partha Pratim Talukdar
Zachary G. Ives
SIGMOD Conference, ACM Press (2010), pp. 387-398
Exploiting Feature Covariance in High-Dimensional Online Learning
Preview
Justin Ma
Alex Kulesza
Mark Dredze
Koby Crammer
Lawrence Saul
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, JMLR (2010), pp. 493-500
Experiments in Graph-based Semi-Supervised Learning Methods for Class-Instance Acquisition
Preview
48th Annual Meeting of the Association for Computational Linguistics (ACL 2010)
Gaussian Margin Machines
Preview
Koby Crammer
Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS 2009), Clearwater Beach, Florida, pp. 105-112
Posterior vs. Parameter Sparsity in Latent Variable Models
Joao Graca
Ben Taskar
Advances in Neural Information Processing Systems 22 (2009), pp. 664-672
Preview abstract
In this paper we explore the problem of biasing unsupervised models to favor sparsity. We extend the posterior regularization framework [8] to encourage the model to achieve posterior sparsity on the unlabeled training data. We apply this new method to learn first-order HMMs for unsupervised part-of-speech (POS) tagging, and show that HMMs learned this way consistently and significantly out-performs both EM-trained HMMs, and HMMs with a sparsity-inducing Dirichlet prior trained by variational EM. We evaluate these HMMs on three languages — English, Bulgarian and Portuguese — under four conditions. We find that our method always improves performance with respect to both baselines, while variational Bayes actually degrades performance in most cases. We increase accuracy with respect to EM by 2.5%-8.7% absolute and we see improvements even in a semisupervised condition where a limited dictionary is provided.
View details
Preview abstract
The recent arrival of large-scale cap analysis of gene expression (CAGE) data sets in mammals provides a wealth of quantitative information on coding and noncoding RNA polymerase II transcription start sites (TSS). Genome-wide CAGE studies reveal that a large fraction of TSS exhibit peaks where the vast majority of associated tags map to a particular location ( approximately 45%), whereas other active regions contain a broader distribution of initiation events. The presence of a strong single peak suggests that transcription at these locations may be mediated by position-specific sequence features. We therefore propose a new model for single-peaked TSS based solely on known transcription factors (TFs) and their respective regions of positional enrichment. This probabilistic model leads to near-perfect classification results in cross-validation (auROC = 0.98), and performance in genomic scans demonstrates that TSS prediction with both high accuracy and spatial resolution is achievable for a specific but large subgroup of mammalian promoters. The interpretable model structure suggests a DNA code in which canonical sequence features such as TATA-box, Initiator, and GC content do play a significant role, but many additional TFs show distinct spatial biases with respect to TSS location and are important contributors to the accurate prediction of single-peak transcription initiation sites. The model structure also reveals that CAGE tag clusters distal from annotated gene starts have distinct characteristics compared to those close to gene 5'-ends. Using this high-resolution single-peak model, we predict TSS for approximately 70% of mammalian microRNAs based on currently available data.
View details
Group Sparse Coding
Preview
Samy Bengio
Yoram Singer
Dennis Strelow
Advances in Neural Information Processing Systems (2009)
Weakly-Supervised Acquisition of Labeled Class Instances using Graph Random Walks
Preview
Joseph Reisinger
Rahul Bhagat
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2008), Association for Computational Linguistics, Honolulu, Hawaii, pp. 582-590
Confidence-Weighted Linear Classification
Mark Dredze
Koby Crammer
International Conference on Machine Learning (ICML) (2008)
Preview abstract
We introduce confidence-weighted linear classifiers, which add parameter confidence information to linear classifiers. Online learners in this setting update both classifier parameters and the estimate of their confidence. The particular online algorithms we study here maintain a Gaussian distribution over parameter vectors and update the mean and covariance of the distribution with each instance. Empirical evaluation on a range of NLP tasks show that our algorithm improves over other state of the art online and batch methods, learns faster in the online setting, and lends itself to better classifier combination after parallel training.
View details
Generating Summary Keywords for Emails Using Topics
Preview
Mark Dredze
Hanna Wallach
Danny Puller
Proceedings of the 2008 International Conference on Intelligent User Interfaces
Learning Bounds for Domain Adaptation
Preview
Koby Crammer
Alex Kulesza
Jennifer Wortman
Advances in Neural Information Processing Systems 20, {MIT} Press, Cambridge, MA (2008)
Structured Learning with Approximate Inference
Preview
Alex Kulesza
Advances in Neural Information Processing Systems 20, {MIT} Press, Cambridge, MA (2008)
Speech Recognition with Weighted Finite-State Transducers
Preview
Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2008)
Reading the Markets: Forecasting Public Opinion of Political Candidates by News Analysis
Kevin Lerman
Ari Gilder
Mark Dredze
Conference on Computational Linguistics (Coling) (2008)
Preview abstract
Media reporting shapes public opinion which can in turn influence events, particularly in political elections, in which candidates both respond to and shape public perception of their campaigns. We use computational linguistics to automatically predict the impact of news on public perception of political candidates. Our system uses daily newspaper articles to predict shifts in public opinion as reflected in prediction markets. We discuss various types of features designed for this problem. The news system improves market prediction over baseline market systems.
View details
Intelligent Email: Reply and Attachment Prediction
Preview
Mark Dredze
Tova Brooks
Josh Carroll
Joshua Magarick
Proceedings of the 2008 International Conference on Intelligent User Interfaces
Euclidean Embedding of Co-occurrence Data
Preview
Gal Chechik
Naftali Tishby
Journal of Machine Learning Research, vol. 8 (2007), pp. 2265-2295
Speech Recognition with Weighted Finite-State Transducers
Preview
Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2007)
Frustratingly Hard Domain Adaptation for Dependency Parsing
Preview
Mark Dredze
João V. Graça
Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 1051-1055
The Need for Open Source Software in Machine Learning
Soren Sonnenburg
Mikio L. Braun
Cheng Soon Ong
Samy Bengio
Leon Bottou
Geoff Holmes
Yann LeCun
Klaus-Robert Mueller
Carl-Edward Rasmussen
Gunnar Raetsch
Bernhard Schoelkopf
Alexander Smola
Pascal Vincent
Jason Weston
Robert C. Williamson
Journal of Machine Learning Research, vol. 8 (2007), pp. 2443-2466
Preview abstract
Open source tools have recently reached a level of maturity which makes them suitable for building large-scale real-world systems. At the same time, the field of machine learning has developed a large body of powerful learning algorithms for diverse applications. However, the true potential of these methods is not utilized, since existing implementations are not openly shared, resulting in software with low usability, and weak interoperability. We argue that this situation can be significantly improved by increasing incentives for researchers to publish their software under an open source model. Additionally, we outline the problems authors are faced with when trying to publish algorithmic implementations of machine learning methods. We believe that a resource of peer reviewed software accompanied by short articles would be highly valuable to both the machine learning and the general scientific community.
View details
A Context Pattern Induction Method for Named Entity Extraction
Preview
Thorsten Brants
Mark Liberman
Proceedings of CoNLL-X (2006), pp. 141-148
Learning to Create Data-Integrating Queries
Marie Jacob
M. Salman Mehmood
Koby Crammer
Zachary Ives
Sudipto Guha
VLDB (2008)
A rate-distortion one-class model and its applications to clustering
K. Crammer
P. Talukdar
Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008), Omnipress, pp. 184-191
Reranking candidate gene models with cross-species comparison for improved gene prediction
Intelligent Email: Aiding Users with AI
Mark Dredze
Hanna Wallach
Danny Puller
Tova Brooks
Josh Carroll
Joshua Magarick
American National Conference on Artificial Intelligence (AAAI) (2008)
Evigan: a hidden variable model for integrating gene evidence for eukaryotic gene prediction
Qian Liu
Aaron J Mackey
David S Roos
Bioinformatics, vol. 24 (2008), pp. 597-605
Confidence-weighted linear classification
M. Dredze
K. Crammer
Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008), Omnipress, pp. 264-271
Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification
Mark Dredze
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Association for Computational Linguistics, Prague, Czech Republic (2007), pp. 440-447
Transductive structured classification through constrained min-cuts
Proceedings of the Second Workshop on TextGraphs: Graph-Based Algorithms for Natural Language Processing, Association for Computational Linguistics (2007), pp. 37-44
Global Discriminative Learning for Higher-Accuracy Computational Gene Prediction
Axel Bernal
Koby Crammer
Artemis Hatzigeorgiou
PLoS Computational Biology, vol. 3 (2007)
Semi-Automated Named Entity Annotation
Mark Mandel
Steven Carroll
Peter White
Proceedings of the Linguistic Annotation Workshop, Association for Computational Linguistics (2007), pp. 53-56
Learning to join everything
CIKM (2007), pp. 9-10
Penn/UMass/CHOP Biocreative II systems
Koby Crammer
Gideon Mann
Kedar Bellare
Andrew McCallum
Steven Carroll
Yang Jin
Peter White
Proceedings of the Second BioCreative Challenge Evaluation Workshop (2007), pp. 119-124
Analysis of Representations for Domain Adaptation
Shai Ben-David
Koby Crammer
Advances in Neural Information Processing Systems 20, MIT Press, Cambridge, MA (2007)
Automated recognition of malignancy mentions in biomedical literature
Yang Jin
Ryan T. McDonald
Kevin Lerman
Mark A. Mandel
Steven Carroll
Mark Y. Liberman
Raymond S. Winters
Peter S. White
BMC Bioinformatics, vol. 7 (2006), pp. 492
Online Learning of Approximate Dependency Parsing Algorithms
Ryan McDonald
11th Conference of the European Chapter of the Association for Computational Linguistics: EACL 2006, pp. 81-88
Domain Adaptation with Structural Correspondence Learning
Ryan McDonald
EMNLP 2006: 2006 Conference on Empirical Methods in Natural Language Processing, pp. 120-128
Embedding Heterogeneous Data Using Statistical Models
Multilingual Dependency Parsing with a Two-Stage Discriminative Parser
Ryan McDonald
Kevin Lerman
Tenth Conference on Computational Natural Language Learning (CoNLL-X) (2006)
An automated procedure to identify biomedical articles that contain cancer-associated gene variants
Ryan McDonald
R Scott Winters
Claire K Ankuda
Joan A Murphy
Amy E Rogers
Marc S Greenblatt
Peter S White
Human Mutation, vol. 27 (2006), pp. 957-64
"Sorry I forgot the attachment": Email Attachment Prediction
Mark Dredze
3rd Conference on Email and Anti-Spam, Stanford, CA (2006)
Online Learning of Approximate Dependency Parsing Algorithms
A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance
Andrew McCallum
Kedar Bellare
Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI 2005)
Identifying gene and protein mentions in text using conditional random fields
Distributed Latent Variable Models of Lexical Co-occurrences
Tenth International Workshop on Artificial Intelligence and Statistics (2005)
Online Large-Margin Training of Dependency Parsers
Online Large-Margin Training of Dependency Parsers
Ryan McDonald
Koby Crammer
43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005)
Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE
Ryan McDonald
Seth Kulick
Scott Winters
Yang Jin
Pete White
43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005)
Non-Projective Dependency Parsing using Spanning Tree Algorithms
Flexible Text Segmentation with Structured Multilabel Classification
Reply Expectation Prediction for Email Management
Mark Dredze
2nd Conference on Email and Anti-Spam, Stanford, CA (2005)
Weighted Automata in Text and Speech Processing
Reply Expectation Prediction for Email Management
Non-Projective Dependency Parsing using Spanning Tree Algorithms
Automatically annotating documents with normalized gene lists
Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE
Ryan McDonald
Seth Kulick
Scott Winters
Yang Jin
Pete White
Proceedings of ACL (2005)
Case-Factor Diagrams for Structured Probabilistic Modeling
David McAllester
Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (2004)
An entity tagger for recognizing acquired genomic variations in cancer literature
Hierarchical Distributed Representations for Statistical Language Modeling
Kilian Weinberger
Lawrence Saul
Advances in Neural Information Processing Systems 17, MIT Press, Cambridge, MA (2004)
Case-Factor Diagrams for Structured Probabilistic Modeling
Euclidean Embedding of Co-Occurrence Data
Amir Globerson
Gal Chechik
Naftali Tishby
Advances in Neural Information Processing Systems (NIPS), MIT press, Cambridge, MA (2004), pp. 497-504
Hierarchical Distributed Representations for Statistical Language Modeling
ATDD: An Algorithmic Tool for Domain Discovery in Protein Sequences
Sanjeev Khanna
Li Li
Algorithms in Bioinformatics, 4th International Workshop (WABI 2004), Springer, pp. 206-217
Shallow Parsing with Conditional Random Fields
Weighted finite-state transducers in speech recognition
Weighted Finite-State Transducers in Speech Recognition
Computer Speech and Language, vol. 16 (2002), pp. 69-88
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
Machine Learning for Efficient Natural-Language Processing
CPM (2000), pp. 11
Maximum Entropy Markov Models for Information Extraction and Segmentation
Andrew McCallum
Dayne Freitag
Machine Learning: Proceedings of the Seventeenth International Conference (ICML 2000), Stanford, California, pp. 591-598
Weighted Finite-State Transducers in Speech Recognition
Proceedings of the ISCA Tutorial and Research Workshop, Automatic Speech Recognition: Challenges for the new Millenium (ASR2000), Paris, France
The Design Principles of a Weighted Finite-State Transducer Library
Theoretical Computer Science, vol. 231 (2000), pp. 17-32
Formal Grammar and Information Theory: Together Again?
Philosophical Transactions of the Royal Society, vol. 358 (2000), pp. 1239-1253
The information bottleneck method
The Information Bottleneck Method
Naftali Z. Tishby
William Bialek
Proceedings of the 37th Allerton Conference on Communication, Control and Computing, Urbana, Illinois (1999)
Similarity-Based Models of Word Cooccurrence Probabilities
SCAN: Designing and Evaluating User Interfaces to Support Retrieval From Speech Archives
Steve Whittaker
Julia Hirschberg
John Choi
Donald Hindle
Amit Singhal
SIGIR (1999), pp. 26-33
Document Expansion for Speech Retrieval
Declarative Programming for a Messy World
ICLP (1999), pp. 3-5
Quantifiers, Anaphora, and Intensionality
Mary Dalrymple
John Lamping
Vijay Saraswat
Semantics and Syntax in Lexical Functional Grammar, MIT Press, Cambridge, Massachusetts (1999), pp. 39-89
Distributional Similarity Models: Clustering vs.~Nearest Neighbors
Lillian Lee
37th Annual Meeting of the Association for Computational Linguistics, Morgan Kaufmann, San Francisco, California (1999), pp. 33-40
Relating Probabilistic Grammars and Automata
Finding Information in Audio: A New Paradigm for Audio Browsing and Retrieval
Julia Hirschberg
Steve Whittaker
Don Hindle
Amit Singhal
Accessing Information in Spoken Audio: Proceedings of the ESCA ETRW Workshop, Cambridge, England (1999), pp. 117-122
Multimedia Standards: Present and Future
ICMCS, Vol. 1 (1999), pp. 145-146
Efficient General Lattice Generation and Rescoring
AT&T at TREC-8
Amit Singhal
Steven P. Abney
Donald Hindle
TREC (1999)
An Efficient Extension to Mixture Techniques for Prediction and Decision Trees
Relating Probabilistic Grammars and Automata
Steven Abney
David McAllester
37th Annual Meeting of the Association for Computational Linguistics, Morgan Kaufmann, San Francisco, California (1999), pp. 542-549
SCAN - Speech Content Based Audio Navigator: A Systems Overview
John Choi
Don Hindle
Julia Hirschberg
Ivan Magrin-Chagnolleau
Christine Nakatani
Amit Singhal
Steve Whittaker
Proceedings of the Fifth International Conference on Spoken Language Processing, Sydney (1998)
A Rational Design for a Weighted Finite-State Transducer Library
Proceedings of the Second International Workshop on Implementing Automata (WIA '97), Springer-Verlag, Berlin-NY (1998), pp. 144-158
Modelling Divergent Production: A multi-domain approach
ECAI (1998), pp. 131-132
Dynamic Compilation of Weighted Context-Free Grammars
Full Expansion of Context-Dependent Networks in Large Vocabulary Speech Recognition
Don Hindle
Andrej Ljolje
Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '98), Seattle, Washington (1998)
AT&T at TREC-7
Dynamic Compilation of Weighted Context-Free Grammars
36th Meeting of the Association for Computational Linguistics (ACL '98), Proceedings of the Conference, Montréal, Québec, Canada (1998), pp. 891-897
Aggregate and Mixed-Order Markov Models for Statistical Language Processing
Lawrence Saul
Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Somerset, NJ. Distributed by Morgan Kaufmann, San Francisco, CA (1997), pp. 81-89
Finite-State Approximation of Phrase-Structure Grammars
Rebecca N. Wright
Finite-State Language Processing, MIT Press, Cambridge, Massachusetts (1997), pp. 149-173
A Rational Design for a Weighted Finite-State Transducer Library
Proceedings of the Workshop on Implementing Automata (WIA '97), London, Ontario, Canada, University of Western Ontario, London, Ontario, Canada (1997)
Quantifiers, Anaphora, and Intensionality
Mary Dalrymple
John Lamping
Vijay A. Saraswat
Journal of Logic, Language, and Information, vol. 6, no. 3 (1997), pp. 219-273
Transducer Composition for Context-Dependent Network Expansion
EuroSpeech'97, European Speech Communication Association, Genova, Italy (1997), pp. 1427-1430
Speech Recognition by Composition of Weighted Finite Automata
Finite-State Language Processing, MIT Press, Cambridge, Massachusetts (1997), pp. 431-453
Similarity-Based Methods For Word Sense Disambiguation
Ido Dagan
Lillian Lee
35th Annual Meeting of the Association for Computational Linguistics, Morgan Kaufmann, San Francisco, California (1997), pp. 56-63
AT&T at TREC-6: SDR Track
A Rational Design for a Weighted Finite-State Transducer Library
WIA'97: Proceedings of the Workshop on Implementing Automata, Springer-Verlag (1997)
Transducer Composition for Context-Dependent Network Expansion
Proceedings of the 5th European Conference on Speech Communication and Technology (Eurospeech '97), Rhodes, Greece (1997)
Similarity-Based Methods For Word Sense Disambiguation
Speech Recognition by Composition of Weighted Finite Automata
Rational Power Series in Text and Speech Processing
Graduate course, University of Pennsylvania, Department of Computer Science, Philadelphia, PA (1996)
Intensional Verbs Without Type-Raising or Lexical Ambiguity
Mary Dalrymple
John Lamping
Vijay Saraswat
Logic, Language and Computation (Volume 1), {CSLI} Publications, Stanford, California (1996), pp. 167-182
A Deductive Account of Quantification in LFG
Mary Dalrymple
John Lamping
Vijay Saraswat
Quantifiers, Deduction, and Context, {CSLI} Publications, Stanford, California (1996), pp. 33-57
Weighted Automata in Text and Speech Processing
Proceedings of the 12th biennial European Conference on Artificial Intelligence (ECAI-96), Workshop on Extended finite state models of language, John Wiley and Sons, Chichester, Budapest, Hungary (1996)
Language, Computation and Artificial Intelligence
ACM Computing Surveys, vol. 28 (1996), pp. 9
Interactions of Scope and Ellipsis
Stuart M. Shieber
Mary Dalrymple
Linguistics and Philosophy, vol. 19 (1996), pp. 527-552
Design of a Linguistic Postprocessor using Variable Memory Length Markov Models
Isabelle Guyon
Proceedings of the Third International Conference on Document Analysis and Recognition, IEEE Computer Society Press, Los Alamitos, California (1995), pp. 454-457
The AT&T 60,000 Word Speech-to-Text System
Andrej Ljolje
Don Hindle
Eurospeech'95: ESCA 4th European Conference on Speech Communication and Technology, Madrid, Spain (1995), pp. 207-210
Principles and Implementation of Deductive Parsing
Stuart M. Shieber
Yves Schabes
Journal of Logic Programming, vol. 24 (1995), pp. 3-36
Beyond Word N-Grams
Yoram Singer
Naftali Z. Tishby
Proceedings of the Third Workshop on Very Large Corpora, Association for Computational Linguistics, Columbus, Ohio (1995), pp. 95-106
Linear Logic for Meaning Assembly
Ellipsis and Higher-Order Unification
Frequencies vs Biases: Machine Learning Problems in Natural Language Processing (Extended Abstract)
COLT (1994), pp. 12
Similarity-Based Estimation of Word Cooccurrence Probabilities
Ido Dagan
Lillian Lee
32nd Annual Meeting of the Association for Computational Linguistics, Morgan Kaufmann, San Francisco, California (1994), pp. 272-278
Frequencies vs. Biases: Machine Learning Problems in Natural Language Processing - Abstract
ICML (1994), pp. 380
Weighted Rational Transductions and their Application to Human Language Processing
Human Language Technology Workshop, Morgan Kaufmann, San Francisco, California (1994), pp. 262-267
Distributional Clustering of English Words
Naftali Z. Tishby
Lillian Lee
30th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Columbus, Ohio (1993), pp. 183-190
Introduction to Special Issue on Natural Language Processing
Empirical Properties of Finite State Approximations for Phrase Structure Grammars
David B. Roe
Proceedings of the International Conference on Spoken Language Processing, Banff, Alberta (1992), pp. 261-264
A spoken language translator for restricted-domain context-free languages
David B. Roe
Alejandro Macarrón
Speech Communication, vol. 11 (1992), pp. 311-319
Inside-Outside Reestimation from Partially Bracketed Corpora
Yves Schabes
30th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Newark, Delaware (1992), pp. 128-135
Quantifier Scoping
Douglas B. Moran
The Core Language Engine, MIT Press, Cambridge, Massachusetts (1992), pp. 149-172
Efficient Grammar Processing for a Spoken Language Translation System
David B. Roe
Alejandro Macarrón
Proceedings of ICASSP, IEEE, San Francisco, California (1992), pp. 213-216
Ellipsis and Higher-Order Unification
Mary Dalrymple
Stuart M. Shieber
Linguistics and Philosophy, vol. 14 (1991), pp. 399-452
Toward a Spoken Language Translator for Restricted-Domain Context-Free Languages
David B. Roe
Alejandro Macarrón
EUROSPEECH 91 -- 2nd European Conference on Speech Communication and Technology, Genova, Italy (1991), pp. 1063-1066
Deductive Interpretation
Natural Language and Speech, Springer-Verlag (1991), pp. 116-133
Semantic Interpretation as Higher-Order Deduction
Logics in AI: European Workshop JELIA'90, Springer-Verlag, Berlin, Germany, Amsterdam, Holland (1991), pp. 78-96
Finite-State Approximation of Phrase-Structure Grammars
Rebecca N. Wright
29th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Berkeley, California (1991), pp. 246-255
Incremental Interpretation
Finite-State Approximations of Grammars
Proceedings of the Second Speech and Natural Language Workshop (1990), pp. 20-25
Semantic-Head-Driven Generation
Stuart M. Shieber
Gertjan van Noord
Robert C. Moore
Computational Linguistics, vol. 16 (1990), pp. 30-42
Prolog and Natural-Language Analysis: into the Third Decade
Logic Programming: Proceedings of the 1990 North American Conference, MIT Press, Cambridge, Massachusetts, Austin, Texas, pp. 813-832
Categorial Semantics and Scoping
Computational Linguistics, vol. 16 (1990), pp. 1-10
Synergistic Use of Direct Manipulation and Natural Language
Phil R. Cohen
Mary Dalrymple
Douglas B. Moran
J. W. Sullivan
R. A. Gargan, Jr.
J. L. Schlossberg
S. W. Tyler
Proceedings of CHI'89, Austin, Texas (1989)
Integrating Speech and Natural Language Processing
Robert C. Moore
Hy Murveit
First Speech and Natural Language Workshop (1989), pp. 243-247
A Calculus for Semantic Composition and Scoping
27th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, University of British Columbia, Vancouver, Canada (1989), pp. 152-160
A Semantic-Head-Driven Generation Algorithm for Unification-Based Formalisms
Stuart M. Shieber
Gertjan van Noord
Robert C. Moore
27th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, University of British Columbia, Vancouver, Canada (1989), pp. 7-17
A Semantic-Head-Driven Generation Algorithm for Unification-Based Formalisms
An Integrated Framework for Semantic and Pragmatic Interpretation
Martha E. Pollack
26th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Buffalo, New York (1988), pp. 75-86
Prolog and Natural-Language Analysis
Stuart M. Shieber
Center for the Study of Language and Information, Stanford, California (1987)
Grammars and Logics of Partial Information
Logic Programming: Proceedings of the Fourth International Conference, MIT Press, Cambridge Massachusetts, Melbourne, Australia (1987), pp. 989-1013
TEAM: An Experiment in the Design of Transportable Natural Language Interfaces
Barbara J. Grosz
Douglas E. Appelt
Paul A. Martin
Artificial Intelligence, vol. 32 (1987), pp. 173-243
Can Drawing Be Liberated from the von Neumann Style
Logic Programming and Its Applications, Ablex, Norwood, New Jersey (1986), pp. 175-187
A Sheaf-Theoretic Model of Concurrency
Luis F. Monteiro
Symposium on Logic and Computer Science, IEEE Computer Society Press, Cambridge, Massachusetts (1986), pp. 66-76
TEAM: An Experimental Transportable Natural-Language Interface
A Structure-Sharing Representation for Unification-Based Grammar Formalisms
23rd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Chicago, Illinois (1985), pp. 137-144
A New Characterization of Attachment Preferences
Natural Language Parsing---Psychological, Computational and Theoretical perspectives, Cambridge University Press, Cambridge, England (1985), pp. 307-319
An Overview of Automated Reasoning and Related Fields
L. Wos
Robert Hong
Robert S. Boyer
J Strother Moore
W. W. Bledsoe
L. J. Henschen
Bruce G. Buchanan
Graham Wrightson
Cordell Green
Journal of Automated Reasoning, vol. 1 (1985), pp. 5-48
The Semantics of Grammar Formalisms Seen as Computer Languages
Stuart M. Shieber
Proceedings of COLING 84, Association for Computational Linguistics, Stanford, California (1984), pp. 123-129
Transportability and Generality in a Natural-Language Interface System
Paul A. Martin
Douglas E. Appelt
Proceedings of the Eight International Joint Conference on Artificial Intelligence (1983), pp. 573-581
A Fact Dependency System for the Logic Programmer
Can Drawing Be Liberated From the Von Neumann Style?
Databases for Business and Office Applications (1983), pp. 184-190
Parsing as Deduction
David H. D. Warren
21st Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Cambridge, Massachusetts (1983), pp. 137-144
An Efficient Easily Adaptable System for Interpreting Natural Language Queries
Extraposition Grammars
Computational Linguistics, vol. 7 (1981), pp. 243-256
Definite Clause Grammars for Language Analysis---a Survey of the Formalism and a Comparison with Augmented Transition Networks
Prolog -- The Language and its Implementation Compared with Lisp
David H. D. Warren
Luis M. Pereira
Proceedings of the Symposium on Artificial Intelligence and Programming Languages, Rochester, New York (1977), pp. 109-115