Fernando Pereira

Fernando Pereira

Fernando Pereira is VP and Engineering Fellow at Google, where he leads research and development in natural language understanding and machine learning. His previous positions include chair of the Computer and Information Science department of the University of Pennsylvania, head of the Machine Learning and Information Retrieval department at AT&T Labs, and research and management positions at SRI International. He received a Ph.D. in Artificial Intelligence from the University of Edinburgh in 1982, and has over 120 research publications on computational linguistics, machine learning, bioinformatics, speech recognition, and logic programming, as well as several patents. He was elected AAAI Fellow in 1991 for contributions to computational linguistics and logic programming, ACM Fellow in 2010 for contributions to machine learning models of natural language and biological sequences, and ACL Fellow for contributions to sequence modeling, finite-state methods, and dependency and deductive parsing. He was president of the Association for Computational Linguistics in 1993.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Conversational Music Retrieval with Synthetic Data
    Megan Eileen Leszczynski
    Ravi Ganti
    Shu Zhang
    Krisztian Balog
    Filip Radlinski
    Arun Tejasvi Chaganty
    Second Workshop on Interactive Learning for Natural Language Processing at NeurIPS 2022
    Preview abstract Users looking for recommendations often wish to improve suggestions through broad natural language feedback (e.g., “How about something more upbeat?”). However, building such conversational retrieval systems requires conversational data with rich user utterances paired with slates of items that cover a diverse range of preferences. This is challenging to collect scalably using conventional methods like crowd-sourcing. We address this problem with a new technique to synthesize high-quality dialog data by transforming the domain expertise encoded in curated item collections into corresponding item-seeking conversations. The method first generates a sequence of hypothetical slates returned by a system, and then uses a language model to introduce corresponding user utterances. We apply the approach on a dataset of curated music playlists to generate 10k diverse music-seeking conversations. A qualitative human evaluation shows that a majority of these conversations express believable sequences of slates and include user utterances that faithfully express preferences for them. When used to train a conversational retrieval model, the synthetic data yields up to a 23% relative gain on standard retrieval metrics compared to baselines trained on non-conversational and conversational datasets. View details
    Points, Paths, and Playscapes: Large-scale Spatial Language Understanding Tasks Set in the Real World
    Daphne Luong
    Bo Pang
    Yuan Zhang
    Proceedings of the First International Workshop on Spatial Language Understanding, Association for Computational Linguistics, New Orleans, Louisiana, USA(2018), pp. 46-52
    Preview abstract Spatial language understanding is important for practical applications and as a building block for better abstract language understanding. Much progress has been made through work on understanding spatial relations and values in images and texts as well as on giving and following navigation instructions in restricted domains. We argue that the next big advances in spatial language understanding can be best supported by creating large-scale datasets that focus on points and paths based in the real world, and then extending these to create online, persistent playscapes that mix human and bot players. The bot players can begin play having undergone a prior training regime, but then must learn, evolve, and survive according to their depth of understanding of scenes, navigation, and interactions. View details
    Preview abstract We describe SLING, a framework for parsing natural language into semantic frames. SLING supports general transition-based, neural-network parsing with bidirectional LSTM input encoding and a Transition Based Recurrent Unit (TBRU) for output decoding. The parsing model is trained end-to-end using only the text tokens as input. The transition system has been designed to output frame graphs directly without any intervening symbolic representation. The SLING framework includes an efficient and scalable frame store implementation as well as a neural network JIT compiler for fast inference during parsing. SLING is implemented in C++ and it is available for download on GitHub. View details
    Preview abstract Entity resolution is the task of linking each mention of an entity in text to the corresponding record in a knowledge base (KB). Coherence models for entity resolution encourage all referring expressions in a document to resolve to entities that are related in the KB. We explore attention-like mechanisms for coherence, where the evidence for each candidate is based on a small set of strong relations, rather than relations to all other entities in the document. The rationale is that document-wide support may simply not exist for non-salient entities, or entities not densely connected in the KB. Our proposed system outperforms state-of-the-art systems on the CoNLL 2003, TAC KBP 2010, 2011 and 2012 tasks. View details
    Preview abstract We describe Sparse Non-negative Matrix (SNM) language model estimation using multinomial loss on held-out data. Being able to train on held-out data is important in practical situations where the training data is usually mismatched from the held-out/test data. It is also less constrained than the previous training algorithm using leave-one-out on training data: it allows the use of richer meta-features in the adjustment model, e.g. the diversity counts used by Kneser-Ney smoothing which would be difficult to deal with correctly in leave-one-out training. In experiments on the one billion words language modeling benchmark, we are able to slightly improve on our previous results which use a different loss function, and employ leave-one-out training on a subset of the main training set. Surprisingly, an adjustment model with meta-features that discard all lexical information can perform as well as lexicalized meta-features. We find that fairly small amounts of held-out data (on the order of 30-70 thousand words) are sufficient for training the adjustment model. In a real-life scenario where the training data is a mix of data sources that are imbalanced in size, and of different degrees of relevance to the held-out and test data, taking into account the data source for a given skip-/n-gram feature and combining them for best performance on held-out/test data improves over skip-/n-gram SNM models trained on pooled data by about 8% in the SMT setup, or as much as 15% in the ASR/IME setup. The ability to mix various data sources based on how relevant they are to a mismatched held-out set is probably the most attractive feature of the new estimation method for SNM LM. View details
    Yedalog: Exploring Knowledge at Scale
    Brian Chin
    Vuk Ercegovac
    Peter Hawkins
    Mark S. Miller
    Franz Och
    Chris Olston
    1st Summit on Advances in Programming Languages (SNAPL 2015), Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, pp. 63-78
    Preview abstract With huge progress on data processing frameworks, human programmers are frequently the bottleneck when analyzing large repositories of data. We introduce Yedalog, a declarative programming language that allows programmers to mix data-parallel pipelines and computation seamlessly in a single language. By contrast, most existing tools for data-parallel computation embed a sublanguage of data-parallel pipelines in a general-purpose language, or vice versa. Yedalog extends Datalog, incorporating not only computational features from logic programming, but also features for working with data structured as nested records. Yedalog programs can run both on a single machine, and distributed across a cluster in batch and interactive modes, allowing programmers to mix different modes of execution easily. View details
    Plato: A Selective Context Model for Entity Resolution
    Michael Ringgaard
    Transactions of the Association for Computational Linguistics, 3(2015), pp. 503-515
    Preview abstract We present Plato, a probabilistic model for entity resolution that includes a novel approach for handling noisy or uninformative features,and supplements labeled training data derived from Wikipedia with a very large unlabeled text corpus. Training and inference in the proposed model can easily be distributed across many servers, allowing it to scale to over 10^7 entities. We evaluate Plato on three standard datasets for entity resolution. Our approach achieves the best results to-date on TAC KBP 2011 and is highly competitive on both the CoNLL 2003 and TAC KBP 2012 datasets. View details
    Preview abstract Google Voice Search is an application that provides a data-rich setup for both language and acoustic modeling research. The approach we take revives an older approach to acoustic modeling that borrows from n-gram language modeling in an attempt to scale up both the amount of training data, and the model size (as measured by the number of parameters in the model), to approximately 100 times larger than current sizes used in automatic speech recognition. Speech recognition experiments are carried out in an N-best list rescoring framework for Google Voice Search. We use 87,000 hours of training data (speech along with transcription) obtained by filtering utterances in Voice Search logs on automatic speech recognition confidence. Models ranging in size between 20--40 million Gaussians are estimated using maximum likelihood training. They achieve relative reductions in word-error-rate of 11% and 6% when combined with first-pass models trained using maximum likelihood, and boosted maximum mutual information, respectively. Increasing the context size beyond five phones (quinphones) does not help. View details
    Large Scale Distributed Acoustic Modeling With Back-off N-grams
    Peng Xu
    Thomas Richardson
    IEEE Transactions on Audio, Speech and Language Processing, 21(2013), pp. 1158-1169
    Preview abstract The paper revives an older approach to acoustic modeling that borrows from n-gram language modeling in an attempt to scale up both the amount of training data and model size (as measured by the number of parameters in the model), to approximately 100 times larger than current sizes used in automatic speech recognition. In such a data-rich setting, we can expand the phonetic context significantly beyond triphones, as well as increase the number of Gaussian mixture components for the context-dependent states that allow it. We have experimented with contexts that span seven or more context-independent phones, and up to 620 mixture components per state. Dealing with unseen phonetic contexts is accomplished using the familiar back-off technique used in language modeling due to implementation simplicity. The back-off acoustic model is estimated, stored and served using MapReduce distributed computing infrastructure. Speech recognition experiments are carried out in an N-best list rescoring framework for Google Voice Search. Training big models on large amounts of data proves to be an effective way to increase the accuracy of a state-of-the-art automatic speech recognition system. We use 87,000 hours of training data (speech along with transcription) obtained by filtering utterances in Voice Search logs on automatic speech recognition confidence. Models ranging in size between 20--40 million Gaussians are estimated using maximum likelihood training. They achieve relative reductions in word-error-rate of 11% and 6% when combined with first-pass models trained using maximum likelihood, and boosted maximum mutual information, respectively. Increasing the context size beyond five phones (quinphones) does not help. View details
    Distributed Acoustic Modeling with Back-off N-grams
    Peng Xu
    Thomas Richardson
    Proceedings of ICASSP 2012, IEEE, pp. 4129-4132
    Preview abstract The paper proposes an approach to acoustic modeling that borrows from n-gram language modeling in an attempt to scale up both the amount of training data and model size (as measured by the number of parameters in the model) to approximately 100 times larger than current sizes used in ASR. Dealing with unseen phonetic contexts is accomplished using the familiar back-off technique used in language modeling due to implementation simplicity. The new acoustic model is estimated and stored using the MapReduce distributed computing infrastructure. Speech recognition experiments are carried out in an Nbest rescoring framework for Google Voice Search. 87,000 hours of training data is obtained in an unsupervised fashion by filtering utterances in Voice Search logs on ASR confidence. The resulting models are trained using maximum likelihood and contain 20-40 million Gaussians. They achieve relative reductions in WER of 11% and 6% over first-pass models trained using maximum likelihood, and boosted MMI, respectively. View details
    Preview abstract Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses parallelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. To evaluate these ideas, we constructed a labeled corpus of 1:5 million disambiguated mentions in Web pages by selecting link anchors referring to Wikipedia entities. We show that the combination of the hierarchical model with distributed inference quickly obtains high accuracy (with error reduction of 38%) on this large dataset, demonstrating the scalability of our approach. View details
    Controlling Complexity in Part-of-Speech Induction
    Joao Graca
    Luisa Coheur
    Ben Taskar
    Journal of Artificial Intelligence Research (JAIR), 41(2011), pp. 527-551
    Preview abstract We consider the problem of fully unsupervised learning of grammatical (part-of-speech) categories from unlabeled text. The standard maximum-likelihood hidden Markov model for this task performs poorly, because of its weak inductive bias and large model capacity. We address this problem by refining the model and modifying the learning objective to control its capacity via para- metric and non-parametric constraints. Our approach enforces word-category association sparsity, adds morphological and orthographic features, and eliminates hard-to-estimate parameters for rare words. We develop an efficient learning algorithm that is not much more computationally intensive than standard training. We also provide an open-source implementation of the algorithm. Our experiments on five diverse languages (Bulgarian, Danish, English, Portuguese, Spanish) achieve significant improvements compared with previous methods for the same task. View details
    Posterior Sparsity in Dependency Grammar Induction
    Jennifer Gillenwater
    Joao Graca
    Ben Taskar
    Journal of Machine Learning Research, 12(2011), pp. 455-490
    Preview abstract A strong inductive bias is essential in unsupervised grammar induction. In this paper, we explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. We use part-of-speech (POS) tags to group dependencies by parent-child types and investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In experiments with 12 different languages, we achieve significant gains in directed attachment accuracy over the standard expectation maximization (EM) baseline, with an average accuracy improvement of 6.5%, outperforming EM by at least 1% for 9 out of 12 languages. Furthermore, the new method outperforms models based on standard Bayesian sparsity-inducing parameter priors with an average improvement of 5% and positive gains of at least 1% for 9 out of 12 languages. On English text in particular, we show that our approach improves performance over other state-of-the-art techniques. View details
    Exploiting Feature Covariance in High-Dimensional Online Learning
    Justin Ma
    Alex Kulesza
    Mark Dredze
    Koby Crammer
    Lawrence Saul
    Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, JMLR(2010), pp. 493-500
    Preview
    Distributed MAP Inference for Undirected Graphical Models
    Sameer Singh
    Andrew McCallum
    Workshop on Learning on Cores, Clusters and Clouds (LCCC), Neural Information Processing Society (NIPS)(2010)
    Preview
    Experiments in Graph-based Semi-Supervised Learning Methods for Class-Instance Acquisition
    Partha Pratim Talukdar
    48th Annual Meeting of the Association for Computational Linguistics (ACL 2010)
    Preview
    Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models
    Slav Petrov
    Proceedings of the 2010 Conference on Empirical Methods on Natural Language Processing (EMNLP '10)
    Preview
    A theory of learning from different domains
    Shai Ben-David
    Koby Crammer
    Alex Kulesza
    Jennifer Vaughan
    Machine Learning, 79(2010), pp. 151-175
    Preview abstract Discriminative learning methods for classification perform well when training and test data are drawn from the same distribution. Often, however, we have plentiful labeled training data from a source domain but wish to learn a classifier which performs well on a target domain with a different distribution and little or no labeled training data. In this work we investigate two questions. First, under what conditions can a classifier trained from source data be expected to perform well on target data? Second, given a small amount of labeled target data, how should we combine it during training with the large amount of labeled source data to achieve the lowest target error at test time? We address the first question by bounding a classifier's target error in terms of its source error and the divergence between the two domains. We give a classifier-induced divergence measure that can be estimated from finite, unlabeled samples from the domains. Under the assumption that there exists some hypothesis that performs well in both domains, we show that this quantity together with the empirical source error characterize the target error of a source-trained classifier. We answer the second question by bounding the target error of a model which minimizes a convex combination of the empirical source and target errors. Previous theoretical work has considered minimizing just the source error, just the target error, or weighting instances from the two domains equally. We show how to choose the optimal combination of source and target error as a function of the divergence, the sample sizes of both domains, and the complexity of the hypothesis class. The resulting bound generalizes the previously studied cases and is always at least as tight as a bound which considers minimizing only the target error or an equal weighting of source and target errors. View details
    Sparsity in Dependency Grammar Induction
    Jennifer Gillenwater
    João Graça
    Ben Taskar
    48th Annual Meeting of the Association for Computational Linguistics (ACL 2010)
    Preview
    Automatically incorporating new sources in keyword search-based data integration
    Partha Pratim Talukdar
    Zachary G. Ives
    SIGMOD Conference, ACM Press(2010), pp. 387-398
    Preview
    Gaussian Margin Machines
    Koby Crammer
    Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS 2009), Clearwater Beach, Florida, pp. 105-112
    Preview
    The Unreasonable Effectiveness of Data
    Alon Halevy
    IEEE Intelligent Systems, 24(2009), pp. 8-12
    Preview
    Posterior vs. Parameter Sparsity in Latent Variable Models
    Joao Graca
    Ben Taskar
    Advances in Neural Information Processing Systems 22(2009), pp. 664-672
    Preview abstract In this paper we explore the problem of biasing unsupervised models to favor sparsity. We extend the posterior regularization framework [8] to encourage the model to achieve posterior sparsity on the unlabeled training data. We apply this new method to learn first-order HMMs for unsupervised part-of-speech (POS) tagging, and show that HMMs learned this way consistently and significantly out-performs both EM-trained HMMs, and HMMs with a sparsity-inducing Dirichlet prior trained by variational EM. We evaluate these HMMs on three languages — English, Bulgarian and Portuguese — under four conditions. We find that our method always improves performance with respect to both baselines, while variational Bayes actually degrades performance in most cases. We increase accuracy with respect to EM by 2.5%-8.7% absolute and we see improvements even in a semisupervised condition where a limited dictionary is provided. View details
    Group Sparse Coding
    Samy Bengio
    Yoram Singer
    Dennis Strelow
    Advances in Neural Information Processing Systems(2009)
    Preview
    A transcription factor affinity-based code for mammalian transcription initiation
    M Megraw
    ST Jensen
    U Ohler
    AG Hatzigeorgiou
    Genome Research, 19(2009), pp. 644-56
    Preview abstract The recent arrival of large-scale cap analysis of gene expression (CAGE) data sets in mammals provides a wealth of quantitative information on coding and noncoding RNA polymerase II transcription start sites (TSS). Genome-wide CAGE studies reveal that a large fraction of TSS exhibit peaks where the vast majority of associated tags map to a particular location ( approximately 45%), whereas other active regions contain a broader distribution of initiation events. The presence of a strong single peak suggests that transcription at these locations may be mediated by position-specific sequence features. We therefore propose a new model for single-peaked TSS based solely on known transcription factors (TFs) and their respective regions of positional enrichment. This probabilistic model leads to near-perfect classification results in cross-validation (auROC = 0.98), and performance in genomic scans demonstrates that TSS prediction with both high accuracy and spatial resolution is achievable for a specific but large subgroup of mammalian promoters. The interpretable model structure suggests a DNA code in which canonical sequence features such as TATA-box, Initiator, and GC content do play a significant role, but many additional TFs show distinct spatial biases with respect to TSS location and are important contributors to the accurate prediction of single-peak transcription initiation sites. The model structure also reveals that CAGE tag clusters distal from annotated gene starts have distinct characteristics compared to those close to gene 5'-ends. Using this high-resolution single-peak model, we predict TSS for approximately 70% of mammalian microRNAs based on currently available data. View details
    Intelligent Email: Reply and Attachment Prediction
    Mark Dredze
    Tova Brooks
    Josh Carroll
    Joshua Magarick
    Proceedings of the 2008 International Conference on Intelligent User Interfaces
    Preview
    Confidence-Weighted Linear Classification
    Mark Dredze
    Koby Crammer
    International Conference on Machine Learning (ICML)(2008)
    Preview abstract We introduce confidence-weighted linear classifiers, which add parameter confidence information to linear classifiers. Online learners in this setting update both classifier parameters and the estimate of their confidence. The particular online algorithms we study here maintain a Gaussian distribution over parameter vectors and update the mean and covariance of the distribution with each instance. Empirical evaluation on a range of NLP tasks show that our algorithm improves over other state of the art online and batch methods, learns faster in the online setting, and lends itself to better classifier combination after parallel training. View details
    Generating Summary Keywords for Emails Using Topics
    Mark Dredze
    Hanna Wallach
    Danny Puller
    Proceedings of the 2008 International Conference on Intelligent User Interfaces
    Preview
    Weakly-Supervised Acquisition of Labeled Class Instances using Graph Random Walks
    Partha Pratim Talukdar
    Joseph Reisinger
    Marius Pasca
    Rahul Bhagat
    Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2008), Association for Computational Linguistics, Honolulu, Hawaii, pp. 582-590
    Preview
    Learning Bounds for Domain Adaptation
    Koby Crammer
    Alex Kulesza
    Jennifer Wortman
    Advances in Neural Information Processing Systems 20, {MIT} Press, Cambridge, MA(2008)
    Preview
    Structured Learning with Approximate Inference
    Alex Kulesza
    Advances in Neural Information Processing Systems 20, {MIT} Press, Cambridge, MA(2008)
    Preview
    Speech Recognition with Weighted Finite-State Transducers
    Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany(2008)
    Preview
    Reading the Markets: Forecasting Public Opinion of Political Candidates by News Analysis
    Kevin Lerman
    Ari Gilder
    Mark Dredze
    Conference on Computational Linguistics (Coling)(2008)
    Preview abstract Media reporting shapes public opinion which can in turn influence events, particularly in political elections, in which candidates both respond to and shape public perception of their campaigns. We use computational linguistics to automatically predict the impact of news on public perception of political candidates. Our system uses daily newspaper articles to predict shifts in public opinion as reflected in prediction markets. We discuss various types of features designed for this problem. The news system improves market prediction over baseline market systems. View details
    The Need for Open Source Software in Machine Learning
    Soren Sonnenburg
    Mikio L. Braun
    Cheng Soon Ong
    Samy Bengio
    Leon Bottou
    Geoff Holmes
    Yann LeCun
    Klaus-Robert Mueller
    Carl-Edward Rasmussen
    Gunnar Raetsch
    Bernhard Schoelkopf
    Alexander Smola
    Pascal Vincent
    Jason Weston
    Robert C. Williamson
    Journal of Machine Learning Research, 8(2007), pp. 2443-2466
    Preview abstract Open source tools have recently reached a level of maturity which makes them suitable for building large-scale real-world systems. At the same time, the field of machine learning has developed a large body of powerful learning algorithms for diverse applications. However, the true potential of these methods is not utilized, since existing implementations are not openly shared, resulting in software with low usability, and weak interoperability. We argue that this situation can be significantly improved by increasing incentives for researchers to publish their software under an open source model. Additionally, we outline the problems authors are faced with when trying to publish algorithmic implementations of machine learning methods. We believe that a resource of peer reviewed software accompanied by short articles would be highly valuable to both the machine learning and the general scientific community. View details
    Euclidean Embedding of Co-occurrence Data
    Amir Globerson
    Gal Chechik
    Naftali Tishby
    Journal of Machine Learning Research, 8(2007), pp. 2265-2295
    Preview
    Frustratingly Hard Domain Adaptation for Dependency Parsing
    Mark Dredze
    Partha Pratim Talukdar
    João V. Graça
    Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 1051-1055
    Preview
    {Speech Recognition with Weighted Finite-State Transducers}
    Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany(2007)
    Preview
    A Context Pattern Induction Method for Named Entity Extraction
    Partha Pratim Talukdar
    Thorsten Brants
    Mark Liberman
    Proceedings of CoNLL-X(2006), pp. 141-148
    Preview
    Learning to Create Data-Integrating Queries
    Partha Pratim Talukdar
    Marie Jacob
    M. Salman Mehmood
    Koby Crammer
    Zachary Ives
    Sudipto Guha
    VLDB(2008)
    Reranking candidate gene models with cross-species comparison for improved gene prediction
    Qian Liu
    Koby Crammer
    David S. Roos
    BMC Bioinformatics, 9(2008), pp. 433
    Preview abstract ABSTRACT: BACKGROUND: Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc). Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and comparative genomics datasets may help to select among competing models of comparable probability by exploiting features likely to be associated with the correct gene models, such as conserved exon/intron structure or protein sequence features. RESULTS: We have investigated the utility of a simple post-processing step for selecting among a set of alternative gene models, using global scoring rules to rerank competing models for more accurate prediction. For each gene locus, we first generate the K best candidate gene models using the gene finder Evigan, and then rerank these models using comparisons with putative orthologous genes from closely-related species. Candidate gene models with lower scores in the original gene finder may be selected if they exhibit strong similarity to probable orthologs in coding sequence, splice site location, or signal peptide occurrence. Experiments on Drosophila melanogaster demonstrate that reranking based on cross-species comparison outperforms the best gene models identified by Evigan alone, and also outperforms the comparative gene finders GeneWise and Augustus+. CONCLUSIONS: Reranking gene models with cross-species comparison improves gene prediction accuracy. This straightforward method can be readily adapted to incorporate additional lines of evidence, as it requires only a ranked source of candidate gene models. View details
    Intelligent Email: Aiding Users with {AI}
    Mark Dredze
    Hanna Wallach
    Danny Puller
    Tova Brooks
    Josh Carroll
    Joshua Magarick
    American National Conference on Artificial Intelligence (AAAI)(2008)
    A rate-distortion one-class model and its applications to clustering
    K. Crammer
    P. Talukdar
    Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008), Omnipress, pp. 184-191
    Evigan: a hidden variable model for integrating gene evidence for eukaryotic gene prediction
    Qian Liu
    Aaron J Mackey
    David S Roos
    Bioinformatics, 24(2008), pp. 597-605
    Preview abstract MOTIVATION: The increasing diversity and variable quality of evidence relevant to gene annotation argues for a probabilistic framework that automatically integrates such evidence to yield candidate gene models. RESULTS: Evigan is an automated gene annotation program for eukaryotic genomes, employing probabilistic inference to integrate multiple sources of gene evidence. The probabilistic model is a dynamic Bayes network whose parameters are adjusted to maximize the probability of observed evidence. Consensus gene predictions are then derived by maximum likelihood decoding, yielding n-best models (with probabilities for each). Evigan is capable of accommodating a variety of evidence types, including (but not limited to) gene models computed by diverse gene finders, BLAST hits, EST matches, and splice site predictions; learned parameters encode the relative quality of evidence sources. Since separate training data are not required (apart from the training sets used by individual gene finders), Evigan is particularly attractive for newly sequenced genomes where little or no reliable manually curated annotation is available. The ability to produce a ranked list of alternative gene models may facilitate identification of alternatively spliced transcripts. Experimental application to ENCODE regions of the human genome, and the genomes of Plasmodium vivax and Arabidopsis thaliana show that Evigan achieves better performance than any of the individual data sources used as evidence. AVAILABILITY: The source code is available at http://www.seas.upenn.edu/~strctlrn/evigan/evigan.html. View details
    Confidence-weighted linear classification
    M. Dredze
    K. Crammer
    Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008), Omnipress, pp. 264-271
    Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification
    Mark Dredze
    Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Association for Computational Linguistics, Prague, Czech Republic(2007), pp. 440-447
    Semi-Automated Named Entity Annotation
    Mark Mandel
    Steven Carroll
    Peter White
    Proceedings of the Linguistic Annotation Workshop, Association for Computational Linguistics(2007), pp. 53-56
    Preview abstract We investigate a way to partially automate corpus annotation for named entity recognition, by requiring only binary decisions from an annotator. Our approach is based on a linear sequence model trained using a k-best MIRA learning algorithm. We ask an annotator to decide whether each mention produced by a high recall tagger is a true mention or a false positive. We conclude that our approach can reduce the effort of extending a seed training corpus by up to 58%. View details
    Penn/UMass/CHOP Biocreative II systems
    Koby Crammer
    Gideon Mann
    Kedar Bellare
    Andrew McCallum
    Steven Carroll
    Yang Jin
    Peter White
    Proceedings of the Second BioCreative Challenge Evaluation Workshop(2007), pp. 119-124
    Preview abstract Our team participated in the entity tagging and normalization tasks of Biocreative II. For the entity tagging task, we used a k-best MIRA learning algorithm with lexicons and automatically derived word clusters. MIRA accommodates different training loss functions, which allowed us to exploit gene alternatives in training. We also performed a greedy search over feature templates and the development data, achieving a final F-measure of 86.28%. For the normalization task, we proposed a new specialized on-line learning algorithm and applied it for filtering out false positives from a high recall list of candidates. For normalization we received an F-measure of 69.8%. View details
    Learning to join everything
    CIKM(2007), pp. 9-10
    Analysis of Representations for Domain Adaptation
    Shai Ben-David
    Koby Crammer
    Advances in Neural Information Processing Systems 20, MIT Press, Cambridge, MA(2007)
    Transductive structured classification through constrained min-cuts
    Proceedings of the Second Workshop on TextGraphs: Graph-Based Algorithms for Natural Language Processing, Association for Computational Linguistics(2007), pp. 37-44
    Preview abstract We extend the Blum and Chawla (2001) graph min-cut algorithm to structured problems. This extension can alternatively be viewed as a joint inference method over a set of training and test instances where parts of the instances interact through a pre-specified associative network. The method has has an efficient approximation through a linear-programming relaxation. On small training data sets, the method achieves up to 34.8% relative error reduction. View details
    Global Discriminative Learning for Higher-Accuracy Computational Gene Prediction
    Axel Bernal
    Koby Crammer
    Artemis Hatzigeorgiou
    PLoS Computational Biology, 3(2007)
    Preview abstract Most ab initio gene predictors use a probabilistic sequence model, typically a hidden Markov model, to combine separately trained models of genomic signals and content. By combining separate models of relevant genomic features, such gene predictors can exploit small training sets and incomplete annotations, and can be trained fairly efficiently. However, that type of piecewise training does not optimize prediction accuracy and has difficulty in accounting for statistical dependencies among different parts of the gene model. With genomic information being created at an ever-increasing rate, it is worth investigating alternative approaches in which many different types of genomic evidence, with complex statistical dependencies, can be integrated by discriminative learning to maximize annotation accuracy. Among discriminative learning methods, large-margin classifiers have become prominent because of the success of support vector machines (SVM) in many classification tasks. We describe CRAIG, a new program for ab initio gene prediction based on a conditional random field model with semi-Markov structure that is trained with an online large-margin algorithm related to multiclass SVMs. Our experiments on benchmark vertebrate datasets and on regions from the ENCODE project show significant improvements in prediction accuracy over published gene predictors that use intrinsic features only, particularly at the gene level and on genes with long introns. View details
    Automated recognition of malignancy mentions in biomedical literature
    Yang Jin
    Ryan T. McDonald
    Kevin Lerman
    Mark A. Mandel
    Steven Carroll
    Mark Y. Liberman
    Raymond S. Winters
    Peter S. White
    BMC Bioinformatics, 7(2006), pp. 492
    Preview abstract ABSTRACT: BACKGROUND: The rapid proliferation of biomedical text makes it increasingly difficult for researchers to identify, synthesize, and utilize developed knowledge in their fields of interest. Automated information extraction procedures can assist in the acquisition and management of this knowledge. Previous efforts in biomedical text mining have focused primarily upon named entity recognition of well-defined molecular objects such as genes, but less work has been performed to identify disease-related objects and concepts. Furthermore, promise has been tempered by an inability to efficiently scale approaches in ways that minimize manual efforts and still perform with high accuracy. Here, we have applied a machine-learning approach previously successful for identifying molecular entities to a disease concept to determine if the underlying probabilistic model effectively generalizes to unrelated concepts with minimal manual intervention for model retraining. RESULTS: We developed a named entity recognizer (MTag), an entity tagger for recognizing clinical descriptions of malignancy presented in text. The application uses the machine-learning technique Conditional Random Fields with additional domain-specific features. MTag was tested with 1,010 training and 432 evaluation documents pertaining to cancer genomics. Overall, our experiments resulted in 0.85 precision, 0.83 recall, and 0.84 F-measure on the evaluation set. Compared with a baseline system using string matching of text with a neoplasm term list, MTag performed with a much higher recall rate (92.1% vs. 42.1% recall) and demonstrated the ability to learn new patterns. Application of MTag to all MEDLINE abstracts yielded the identification of 580,002 unique and 9,153,340 overall mentions of malignancy. Significantly, addition of an extensive lexicon of malignancy mentions as a feature set for extraction had minimal impact in performance. CONCLUSIONS: Together, these results suggest that the identification of disparate biomedical entity classes in free text may be extractable with high accuracy and only moderate additional effort for each new application domain. View details
    An automated procedure to identify biomedical articles that contain cancer-associated gene variants
    Ryan McDonald
    R Scott Winters
    Claire K Ankuda
    Joan A Murphy
    Amy E Rogers
    Marc S Greenblatt
    Peter S White
    Human Mutation, 27(2006), pp. 957-64
    Preview abstract The proliferation of biomedical literature makes it increasingly difficult for researchers to find and manage relevant information. However, identifying research articles containing mutation data, a requisite first step in integrating large and complex mutation data sets, is currently tedious, time-consuming and imprecise. More effective mechanisms for identifying articles containing mutation information would be beneficial both for the curation of mutation databases and for individual researchers. We developed an automated method that uses information extraction, classifier, and relevance ranking techniques to determine the likelihood of MEDLINE abstracts containing information regarding genomic variation data suitable for inclusion in mutation databases. We targeted the CDKN2A (p16) gene and the procedure for document identification currently used by CDKN2A Database curators as a measure of feasibility. A set of abstracts was manually identified from a MEDLINE search as potentially containing specific CDKN2A mutation events. A subset of these abstracts was used as a training set for a maximum entropy classifier to identify text features distinguishing "relevant" from "not relevant" abstracts. Each document was represented as a set of indicative word, word pair, and entity tagger-derived genomic variation features. When applied to a test set of 200 candidate abstracts, the classifier predicted 88 articles as being relevant; of these, 29 of 32 manuscripts in which manual curation found CDKN2A sequence variants were positively predicted. Thus, the set of potentially useful articles that a manual curator would have to review was reduced by 56%, maintaining 91% recall (sensitivity) and more than doubling precision (positive predictive value). Subsequent expansion of the training set to 494 articles yielded similar precision and recall rates, and comparison of the original and expanded trials demonstrated that the average precision improved with the larger data set. Our results show that automated systems can effectively identify article subsets relevant to a given task and may prove to be powerful tools for the broader research community. This procedure can be readily adapted to any or all genes, organisms, or sets of documents. View details
    Online Learning of Approximate Dependency Parsing Algorithms
    Ryan McDonald
    Proceedings of EACL(2006)
    Domain Adaptation with Structural Correspondence Learning
    Ryan McDonald
    EMNLP 2006: 2006 Conference on Empirical Methods in Natural Language Processing, pp. 120-128
    Embedding Heterogeneous Data Using Statistical Models
    Amir Globerson
    Gal Chechik
    Naftali Tishby
    AAAI(2006)
    Online Learning of Approximate Dependency Parsing Algorithms
    Ryan McDonald
    11th Conference of the European Chapter of the Association for Computational Linguistics: EACL 2006, pp. 81-88
    Multilingual Dependency Parsing with a Two-Stage Discriminative Parser
    Ryan McDonald
    Kevin Lerman
    Tenth Conference on Computational Natural Language Learning (CoNLL-X)(2006)
    "Sorry I forgot the attachment": Email Attachment Prediction
    Mark Dredze
    3rd Conference on Email and Anti-Spam, Stanford, CA(2006)
    Distributed Latent Variable Models of Lexical Co-occurrences
    Amir Globerson
    Tenth International Workshop on Artificial Intelligence and Statistics(2005)
    Automatically annotating documents with normalized gene lists
    Jeremiah Crim
    Ryan McDonald
    BMC Bioinformatics(2005)
    Simple Algorithms for Complex Relation Extraction with Applications to Biomedical {IE}
    Ryan McDonald
    Seth Kulick
    Scott Winters
    Yang Jin
    Pete White
    43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005)
    Reply Expectation Prediction for Email Management
    Mark Dredze
    CEAS(2005)
    Non-Projective Dependency Parsing using Spanning Tree Algorithms
    Ryan T. McDonald
    Kiril Ribarov
    Jan Hajic
    HLT/EMNLP(2005)
    Weighted Automata in Text and Speech Processing
    Identifying gene and protein mentions in text using conditional random fields
    Ryan McDonald
    BMC Bioinformatics(2005)
    Online Large-Margin Training of Dependency Parsers
    Ryan McDonald
    Koby Crammer
    43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005)
    A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance
    Andrew McCallum
    Kedar Bellare
    Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI 2005)
    Online Large-Margin Training of Dependency Parsers
    Ryan McDonald
    Koby Crammer
    Proceedings of ACL(2005)
    Flexible Text Segmentation with Structured Multilabel Classification
    Ryan McDonald
    Koby Crammer
    Proceedings of HLT-EMNLP(2005)
    Reply Expectation Prediction for Email Management
    Mark Dredze
    2nd Conference on Email and Anti-Spam, Stanford, CA(2005)
    Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE
    Ryan McDonald
    Seth Kulick
    Scott Winters
    Yang Jin
    Pete White
    Proceedings of ACL(2005)
    Non-Projective Dependency Parsing using Spanning Tree Algorithms
    Ryan McDonald
    Kiril Ribarov
    Jan Hajic
    Proceedings of HLT-EMNLP(2005)
    Hierarchical Distributed Representations for Statistical Language Modeling
    Kilian Q. Weinberger
    Lawrence K. Saul
    NIPS(2004)
    An entity tagger for recognizing acquired genomic variations in cancer literature
    Ryan McDonald
    Scott Winters
    Mark Mandel
    Yang Jin
    Pete White
    Bioinformatics(2004)
    ATDD: An Algorithmic Tool for Domain Discovery in Protein Sequences
    Sanjeev Khanna
    Li Li
    Algorithms in Bioinformatics, 4th International Workshop (WABI 2004), Springer, pp. 206-217
    Case-Factor Diagrams for Structured Probabilistic Modeling
    David A. McAllester
    UAI(2004), pp. 382-391
    Case-Factor Diagrams for Structured Probabilistic Modeling
    David McAllester
    Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence(2004)
    Hierarchical Distributed Representations for Statistical Language Modeling
    Kilian Weinberger
    Lawrence Saul
    Advances in Neural Information Processing Systems 17, MIT Press, Cambridge, MA(2004)
    Euclidean Embedding of Co-Occurrence Data
    Amir Globerson
    Gal Chechik
    Naftali Tishby
    Advances in Neural Information Processing Systems (NIPS), MIT press, Cambridge, MA(2004), pp. 497-504
    Shallow Parsing with Conditional Random Fields
    Fei Sha
    HLT-NAACL(2003)
    Weighted finite-state transducers in speech recognition
    Computer Speech & Language, 16(2002), pp. 69-88
    {Weighted Finite-State Transducers in Speech Recognition}
    Computer Speech and Language, 16(2002), pp. 69-88
    Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
    John Lafferty
    Andrew McCallum
    Proceedings of ICML-01(2001), pp. 282-289
    Maximum Entropy {Markov} Models for Information Extraction and Segmentation
    Andrew McCallum
    Dayne Freitag
    Machine Learning: Proceedings of the Seventeenth International Conference (ICML 2000), Stanford, California, pp. 591-598
    Formal Grammar and Information Theory: Together Again?
    Philosophical Transactions of the Royal Society, 358(2000), pp. 1239-1253
    Machine Learning for Efficient Natural-Language Processing
    CPM(2000), pp. 11
    {Weighted Finite-State Transducers in Speech Recognition}
    Proceedings of the ISCA Tutorial and Research Workshop, Automatic Speech Recognition: Challenges for the new Millenium (ASR2000), Paris, France
    {The Design Principles of a Weighted Finite-State Transducer Library}
    Theoretical Computer Science, 231(2000), pp. 17-32
    The information bottleneck method
    Naftali Tishby
    William Bialek
    arXiv, physics/0004057(2000)
    Efficient General Lattice Generation and Rescoring
    Andrej Ljolje
    EUROSPEECH 99(1999), pp. 1251-1254
    Document Expansion for Speech Retrieval
    Amit Singhal
    SIGIR(1999), pp. 34-41
    SCAN: Designing and Evaluating User Interfaces to Support Retrieval From Speech Archives
    Steve Whittaker
    Julia Hirschberg
    John Choi
    Donald Hindle
    Amit Singhal
    SIGIR(1999), pp. 26-33
    An Efficient Extension to Mixture Techniques for Prediction and Decision Trees
    Yoram Singer
    Machine Learning, 36(1999), pp. 183-199
    Relating Probabilistic Grammars and Automata
    Steven P. Abney
    David A. McAllester
    ACL(1999)
    The Information Bottleneck Method
    Naftali Z. Tishby
    William Bialek
    Proceedings of the 37th Allerton Conference on Communication, Control and Computing, Urbana, Illinois(1999)
    Relating Probabilistic Grammars and Automata
    Steven Abney
    David McAllester
    37th Annual Meeting of the Association for Computational Linguistics, Morgan Kaufmann, San Francisco, California(1999), pp. 542-549
    Similarity-Based Models of Word Cooccurrence Probabilities
    Ido Dagan
    Lillian Lee
    Machine Learning, 34(1999), pp. 43-69
    Distributional Similarity Models: Clustering vs.~Nearest Neighbors
    Lillian Lee
    37th Annual Meeting of the Association for Computational Linguistics, Morgan Kaufmann, San Francisco, California(1999), pp. 33-40
    Quantifiers, Anaphora, and Intensionality
    Mary Dalrymple
    John Lamping
    Vijay Saraswat
    Semantics and Syntax in Lexical Functional Grammar, MIT Press, Cambridge, Massachusetts(1999), pp. 39-89
    AT&T at TREC-8
    Amit Singhal
    Steven P. Abney
    Donald Hindle
    TREC(1999)
    Multimedia Standards: Present and Future
    ICMCS, Vol. 1(1999), pp. 145-146
    Finding Information in Audio: A New Paradigm for Audio Browsing and Retrieval
    Julia Hirschberg
    Steve Whittaker
    Don Hindle
    Amit Singhal
    Accessing Information in Spoken Audio: Proceedings of the ESCA ETRW Workshop, Cambridge, England(1999), pp. 117-122
    Declarative Programming for a Messy World
    ICLP(1999), pp. 3-5
    {Dynamic Compilation of Weighted Context-Free Grammars}
    36th Meeting of the Association for Computational Linguistics (ACL '98), Proceedings of the Conference, Montréal, Québec, Canada(1998), pp. 891-897
    {A Rational Design for a Weighted Finite-State Transducer Library}
    Proceedings of the Second International Workshop on Implementing Automata (WIA '97), Springer-Verlag, Berlin-NY(1998), pp. 144-158
    Modelling Divergent Production: A multi-domain approach
    ECAI(1998), pp. 131-132
    AT&T at TREC-7
    Amit Singhal
    John Choi
    Donald Hindle
    David D. Lewis
    TREC(1998), pp. 186-198
    Dynamic Compilation of Weighted Context-Free Grammars
    Proceedings of COLING-ACL '98, Montreal, Canada(1998), pp. 891-897
    {Full Expansion of Context-Dependent Networks in Large Vocabulary Speech Recognition}
    Don Hindle
    Andrej Ljolje
    Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '98), Seattle, Washington(1998)
    {SCAN} - Speech Content Based Audio Navigator: A Systems Overview
    John Choi
    Don Hindle
    Julia Hirschberg
    Ivan Magrin-Chagnolleau
    Christine Nakatani
    Amit Singhal
    Steve Whittaker
    Proceedings of the Fifth International Conference on Spoken Language Processing, Sydney(1998)
    Speech Recognition by Composition of Weighted Finite Automata
    Finite-State Language Processing, MIT Press, Cambridge, Massachusetts(1997), pp. 431-453
    {A Rational Design for a Weighted Finite-State Transducer Library}
    Proceedings of the Workshop on Implementing Automata (WIA '97), London, Ontario, Canada, University of Western Ontario, London, Ontario, Canada(1997)
    Similarity-Based Methods For Word Sense Disambiguation
    Ido Dagan
    Lillian Lee
    35th Annual Meeting of the Association for Computational Linguistics, Morgan Kaufmann, San Francisco, California(1997), pp. 56-63
    A Rational Design for a Weighted Finite-State Transducer Library
    WIA'97: Proceedings of the Workshop on Implementing Automata, Springer-Verlag(1997)
    Similarity-Based Methods For Word Sense Disambiguation
    Ido Dagan
    Lillian Lee
    arXiv(1997)
    Transducer Composition for Context-Dependent Network Expansion
    EuroSpeech'97, European Speech Communication Association, Genova, Italy(1997), pp. 1427-1430
    Aggregate and Mixed-Order {Markov} Models for Statistical Language Processing
    Lawrence Saul
    Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Somerset, NJ. Distributed by Morgan Kaufmann, San Francisco, CA(1997), pp. 81-89
    {Transducer Composition for Context-Dependent Network Expansion}
    Proceedings of the 5th European Conference on Speech Communication and Technology (Eurospeech '97), Rhodes, Greece(1997)
    Finite-State Approximation of Phrase-Structure Grammars
    Rebecca N. Wright
    Finite-State Language Processing, MIT Press, Cambridge, Massachusetts(1997), pp. 149-173
    Quantifiers, Anaphora, and Intensionality
    Mary Dalrymple
    John Lamping
    Vijay A. Saraswat
    Journal of Logic, Language, and Information, 6, no. 3(1997), pp. 219-273
    AT&T at TREC-6: SDR Track
    Amit Singhal
    John Choi
    Donald Hindle
    TREC(1997), pp. 227-232
    Intensional Verbs Without Type-Raising or Lexical Ambiguity
    Mary Dalrymple
    John Lamping
    Vijay Saraswat
    Logic, Language and Computation (Volume 1), {CSLI} Publications, Stanford, California(1996), pp. 167-182
    Speech Recognition by Composition of Weighted Finite Automata
    Interactions of Scope and Ellipsis
    Stuart M. Shieber
    Mary Dalrymple
    Linguistics and Philosophy, 19(1996), pp. 527-552
    {Weighted Automata in Text and Speech Processing}
    Proceedings of the 12th biennial European Conference on Artificial Intelligence (ECAI-96), Workshop on Extended finite state models of language, John Wiley and Sons, Chichester, Budapest, Hungary(1996)
    Language, Computation and Artificial Intelligence
    ACM Computing Surveys, 28(1996), pp. 9
    {Rational Power Series in Text and Speech Processing}
    Graduate course, University of Pennsylvania, Department of Computer Science, Philadelphia, PA(1996)
    A Deductive Account of Quantification in {LFG}
    Mary Dalrymple
    John Lamping
    Vijay Saraswat
    Quantifiers, Deduction, and Context, {CSLI} Publications, Stanford, California(1996), pp. 33-57
    Ellipsis and Higher-Order Unification
    Mary Dalrymple
    Stuart M. Shieber
    arXiv(1995)
    Linear Logic for Meaning Assembly
    Mary Dalrymple
    John Lamping
    Vijay A. Saraswat
    arXiv(1995)
    The AT&T 60,000 Word Speech-to-Text System
    Andrej Ljolje
    Don Hindle
    Eurospeech'95: ESCA 4th European Conference on Speech Communication and Technology, Madrid, Spain(1995), pp. 207-210
    Principles and Implementation of Deductive Parsing
    Stuart M. Shieber
    Yves Schabes
    Journal of Logic Programming, 24(1995), pp. 3-36
    Design of a Linguistic Postprocessor using Variable Memory Length {Markov} Models
    Isabelle Guyon
    Proceedings of the Third International Conference on Document Analysis and Recognition, IEEE Computer Society Press, Los Alamitos, California(1995), pp. 454-457
    Beyond Word N-Grams
    Yoram Singer
    Naftali Z. Tishby
    Proceedings of the Third Workshop on Very Large Corpora, Association for Computational Linguistics, Columbus, Ohio(1995), pp. 95-106
    Frequencies vs. Biases: Machine Learning Problems in Natural Language Processing - Abstract
    ICML(1994), pp. 380
    Frequencies vs Biases: Machine Learning Problems in Natural Language Processing (Extended Abstract)
    COLT(1994), pp. 12
    Similarity-Based Estimation of Word Cooccurrence Probabilities
    Ido Dagan
    Lillian Lee
    32nd Annual Meeting of the Association for Computational Linguistics, Morgan Kaufmann, San Francisco, California(1994), pp. 272-278
    Weighted Rational Transductions and their Application to Human Language Processing
    Human Language Technology Workshop, Morgan Kaufmann, San Francisco, California(1994), pp. 262-267
    Introduction to Special Issue on Natural Language Processing
    Barbara J. Grosz
    Artificial Intelligence, 63(1993), pp. 1-15
    Distributional Clustering of English Words
    Naftali Z. Tishby
    Lillian Lee
    30th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Columbus, Ohio(1993), pp. 183-190
    A spoken language translator for restricted-domain context-free languages
    David B. Roe
    Pedro J. Moreno
    Alejandro Macarrón
    Speech Communication, 11(1992), pp. 311-319
    Efficient Grammar Processing for a Spoken Language Translation System
    David B. Roe
    Pedro J. Moreno
    Alejandro Macarrón
    Proceedings of ICASSP, IEEE, San Francisco, California(1992), pp. 213-216
    Inside-Outside Reestimation from Partially Bracketed Corpora
    Yves Schabes
    30th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Newark, Delaware(1992), pp. 128-135
    Empirical Properties of Finite State Approximations for Phrase Structure Grammars
    David B. Roe
    Proceedings of the International Conference on Spoken Language Processing, Banff, Alberta(1992), pp. 261-264
    Quantifier Scoping
    Douglas B. Moran
    The Core Language Engine, MIT Press, Cambridge, Massachusetts(1992), pp. 149-172
    Finite-State Approximation of Phrase-Structure Grammars
    Rebecca N. Wright
    29th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Berkeley, California(1991), pp. 246-255
    Semantic Interpretation as Higher-Order Deduction
    Logics in AI: European Workshop JELIA'90, Springer-Verlag, Berlin, Germany, Amsterdam, Holland(1991), pp. 78-96
    Incremental Interpretation
    Martha E. Pollack
    Artificial Intelligence, 50(1991), pp. 37-82
    Deductive Interpretation
    Natural Language and Speech, Springer-Verlag(1991), pp. 116-133
    Ellipsis and Higher-Order Unification
    Mary Dalrymple
    Stuart M. Shieber
    Linguistics and Philosophy, 14(1991), pp. 399-452
    Toward a Spoken Language Translator for Restricted-Domain Context-Free Languages
    David B. Roe
    Pedro J. Moreno
    Alejandro Macarrón
    EUROSPEECH 91 -- 2nd European Conference on Speech Communication and Technology, Genova, Italy(1991), pp. 1063-1066
    Semantic-Head-Driven Generation
    Stuart M. Shieber
    Gertjan van Noord
    Robert C. Moore
    Computational Linguistics, 16(1990), pp. 30-42
    Categorial Semantics and Scoping
    Computational Linguistics, 16(1990), pp. 1-10
    Finite-State Approximations of Grammars
    Proceedings of the Second Speech and Natural Language Workshop(1990), pp. 20-25
    Prolog and Natural-Language Analysis: into the Third Decade
    Logic Programming: Proceedings of the 1990 North American Conference, MIT Press, Cambridge, Massachusetts, Austin, Texas, pp. 813-832
    Synergistic Use of Direct Manipulation and Natural Language
    Phil R. Cohen
    Mary Dalrymple
    Douglas B. Moran
    J. W. Sullivan
    R. A. Gargan, Jr.
    J. L. Schlossberg
    S. W. Tyler
    Proceedings of CHI'89, Austin, Texas(1989)
    Integrating Speech and Natural Language Processing
    Robert C. Moore
    Hy Murveit
    First Speech and Natural Language Workshop(1989), pp. 243-247
    A Semantic-Head-Driven Generation Algorithm for Unification-Based Formalisms
    Stuart M. Shieber
    Gertjan van Noord
    Robert C. Moore
    27th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, University of British Columbia, Vancouver, Canada(1989), pp. 7-17
    A Calculus for Semantic Composition and Scoping
    27th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, University of British Columbia, Vancouver, Canada(1989), pp. 152-160
    A Semantic-Head-Driven Generation Algorithm for Unification-Based Formalisms
    Stuart M. Shieber
    Gertjan van Noord
    Robert C. Moore
    ACL(1989), pp. 7-17
    An Integrated Framework for Semantic and Pragmatic Interpretation
    Martha E. Pollack
    26th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Buffalo, New York(1988), pp. 75-86
    {TEAM}: An Experiment in the Design of Transportable Natural Language Interfaces
    Barbara J. Grosz
    Douglas E. Appelt
    Paul A. Martin
    Artificial Intelligence, 32(1987), pp. 173-243
    Grammars and Logics of Partial Information
    Logic Programming: Proceedings of the Fourth International Conference, MIT Press, Cambridge Massachusetts, Melbourne, Australia(1987), pp. 989-1013
    Prolog and Natural-Language Analysis
    Stuart M. Shieber
    Center for the Study of Language and Information, Stanford, California(1987)
    A Sheaf-Theoretic Model of Concurrency
    Luis F. Monteiro
    Symposium on Logic and Computer Science, IEEE Computer Society Press, Cambridge, Massachusetts(1986), pp. 66-76
    TEAM: An Experimental Transportable Natural-Language Interface
    Paul A. Martin
    Douglas E. Appelt
    Barbara J. Grosz
    FJCC(1986), pp. 260-267
    Can Drawing Be Liberated from the von Neumann Style
    Logic Programming and Its Applications, Ablex, Norwood, New Jersey(1986), pp. 175-187
    A Structure-Sharing Representation for Unification-Based Grammar Formalisms
    23rd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Chicago, Illinois(1985), pp. 137-144
    A New Characterization of Attachment Preferences
    Natural Language Parsing---Psychological, Computational and Theoretical perspectives, Cambridge University Press, Cambridge, England(1985), pp. 307-319
    An Overview of Automated Reasoning and Related Fields
    L. Wos
    Robert Hong
    Robert S. Boyer
    J Strother Moore
    W. W. Bledsoe
    L. J. Henschen
    Bruce G. Buchanan
    Graham Wrightson
    Cordell Green
    Journal of Automated Reasoning, 1(1985), pp. 5-48
    The Semantics of Grammar Formalisms Seen as Computer Languages
    Stuart M. Shieber
    Proceedings of COLING 84, Association for Computational Linguistics, Stanford, California(1984), pp. 123-129
    Transportability and Generality in a Natural-Language Interface System
    Paul A. Martin
    Douglas E. Appelt
    Proceedings of the Eight International Joint Conference on Artificial Intelligence(1983), pp. 573-581
    Parsing as Deduction
    David H. D. Warren
    21st Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Cambridge, Massachusetts(1983), pp. 137-144
    A Fact Dependency System for the Logic Programmer
    Peter S. G. Swinson
    Aart Bijl
    Computer-Aided Design, 14(1983), pp. 235-243
    Can Drawing Be Liberated From the Von Neumann Style?
    Databases for Business and Office Applications(1983), pp. 184-190
    An Efficient Easily Adaptable System for Interpreting Natural Language Queries
    David H. D. Warren
    Computational Linguistics, 8(1982), pp. 110-122
    Extraposition Grammars
    Computational Linguistics, 7(1981), pp. 243-256
    Definite Clause Grammars for Language Analysis---a Survey of the Formalism and a Comparison with Augmented Transition Networks
    David H. D. Warren
    Artificial Intelligence, 13(1980), pp. 231-278
    {Prolog} -- The Language and its Implementation Compared with {Lisp}
    David H. D. Warren
    Luis M. Pereira
    Proceedings of the Symposium on Artificial Intelligence and Programming Languages, Rochester, New York(1977), pp. 109-115