# Mehryar Mohri

Mehryar Mohri leads the Learning Theory Team in Google Research. The team has extensive expertise in a variety of areas, including learning theory, statistical learning theory, optimization, decision making under uncertainty, reinforcement learning, and theory and algorithms in general.

Authored Publications

Google Publications

Other Publications

Sort By

A Field Guide to Federated Optimization

Jianyu Wang

Gauri Joshi

Maruan Al-Shedivat

Galen Andrew

A. Salman Avestimehr

Katharine Daly

Deepesh Data

Suhas Diggavi

Hubert Eichner

Advait Gadhikar

Antonious M. Girgis

Filip Hanzely

Chaoyang He

Samuel Horvath

Martin Jaggi

Tara Javidi

Sai Praneeth Karimireddy

Jakub Konečný

Sanmi Koyejo

Tian Li

Peter Richtarik

Virginia Smith

Mahdi Soltanolkotabi

Weikang Song

Sebastian Stich

Ameet Talwalkar

Hongyi Wang

Blake Woodworth

Honglin Yuan

Mi Zhang

Tong Zhang

Chunxiang (Jake) Zheng

Chen Zhu

arxiv (2021)

Preview abstract
Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and other constraints that are not primary considerations in other problem settings. This paper provides recommendations and guidelines on formulating, designing, evaluating and analyzing federated optimization algorithms through concrete examples and practical implementation, with a focus on conducting effective simulations to infer real-world performance. The goal of this work is not to survey the current literature, but to inspire researchers and practitioners to design federated learning algorithms that can be used in various practical applications.
View details

Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning

Martin Jaggi

Sai Praneeth Karimireddy

Sebastian Stich

ICML 2021 (2020)

Preview abstract
Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon. In this work, we propose a general algorithmic framework, \mime, which i) mitigates client drift and ii) adapts arbitrary centralized
optimization algorithms such as SGD and Adam to the federated learning setting. Mime uses a combination of control-variates and server-level statistics (e.g. momentum) at every client-update step to ensure that each local update mimics that of the centralized method run on iid data. We prove a reduction result showing that \mime can translate the convergence of a generic algorithm in the centralized setting into convergence in the federated setting. Further, we show for the first time that multiple local steps can lead to faster convergence in the cross-device FL setting. Our thorough theoretical and empirical analyses establish Mime's superiority over other other baselines.
View details

Learning GANs and Ensembles Using Discrepancy

Ningshan Zhang

Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019

Preview abstract
Generative adversarial networks (GANs) generate data based on minimizing a divergence between two distributions. The choice of that divergence is therefore critical. We argue that the divergence must take into account the hypothesis set and the loss function used in a subsequent learning task, where the data generated by a GAN serves for training. Taking that structural information into account is also important to derive generalization guarantees. Thus, we propose to use the discrepancy measure, which was originally introduced for the closely related problem of domain adaptation and which precisely takes into account the hypothesis set and the loss function. We show that discrepancy admits favorable properties for training GANs and prove explicit generalization guarantees. We present efficient algorithms using discrepancy for two tasks: training a GAN directly, namely DGAN, and mixing previously trained generative models, namely EDGAN. Our experiments on toy examples and several benchmark datasets show that DGAN is competitive with other GANs and that EDGAN outperforms existing GAN ensembles, such as AdaGAN.
View details

Advances and Open Problems in Federated Learning

Brendan Avent

Aurélien Bellet

Mehdi Bennis

Arjun Nitin Bhagoji

Graham Cormode

Rachel Cummings

Rafael G.L. D'Oliveira

Salim El Rouayheb

David Evans

Josh Gardner

Adrià Gascón

Phillip B. Gibbons

Marco Gruteser

Zaid Harchaoui

Chaoyang He

Lie He

Zhouyuan Huo

Justin Hsu

Martin Jaggi

Tara Javidi

Gauri Joshi

Mikhail Khodak

Jakub Konečný

Aleksandra Korolova

Farinaz Koushanfar

Sanmi Koyejo

Tancrède Lepoint

Yang Liu

Prateek Mittal

Richard Nock

Ayfer Özgür

Rasmus Pagh

Ramesh Raskar

Dawn Song

Weikang Song

Sebastian U. Stich

Ziteng Sun

Florian Tramèr

Praneeth Vepakomma

Jianyu Wang

Li Xiong

Qiang Yang

Felix X. Yu

Han Yu

Arxiv (2019)

Preview abstract
Federated learning (FL) is a machine learning setting where many clients (e.g., mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g., service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and mitigates many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents a comprehensive list of open problems and challenges.
View details

Adaptation Based on Generalized Discrepancy

Journal of Machine Learning Research, vol. 20 (2019), pp. 1-30

Preview abstract
We present a new algorithm for domain adaptation improving upon a discrepancy minimization algorithm, (DM), previously shown to outperform a number of algorithms for this problem. Unlike many previously proposed solutions for domain adaptation, our algorithm does not consist of a fixed reweighting of the losses over the training sample. Instead, the reweighting depends on the hypothesis sought. The algorithm is derived from a less conservative notion of discrepancy than the DM algorithm called generalized discrepancy. We present a detailed description of our algorithm and show that it can be formulated as a convex optimization problem. We also give a detailed theoretical analysis of its learning guarantees which helps us select its parameters. Finally, we report the results of experiments demonstrating that it improves upon discrepancy minimization.
View details

AdaNet: A Scalable and Flexible Framework for Automatically Learning Ensembles

Charles Weill

Vitaly Kuznetsov

Scott Yang

Scott Yak

Hanna Mazzawi

Eugen Hotaj

Ghassen Jerfel

Vladimir Macko

(2019)

Preview abstract
AdaNet is a lightweight TensorFlow-based (Abadi et al., 2015) framework for automatically learning high-quality ensembles with minimal expert intervention. Our framework is inspired by the AdaNet algorithm (Cortes et al., 2017) which learns the structure of a neural network as an ensemble of subnetworks. We designed it to: (1) integrate with the existing TensorFlow ecosystem, (2) offer sensible default search spaces to perform well on novel datasets, (3) present a flexible API to utilize expert information when available, and (4) efficiently accelerate training with distributed CPU, GPU, and TPU hardware. The code is open-source and available at https://github.com/tensorflow/adanet.
View details

Relative deviation learning bounds and generalization with unbounded loss functions

Spencer Greenberg

Annals of Mathematics and Artificial Intelligence (2019)

Preview abstract
We present an extensive analysis of relative deviation bounds, including detailed proofs of two-sided inequalities and their implications. We also give detailed proofs of two-sided generalization bounds that hold in the general case of unbounded loss functions, under the assumption that a moment of the loss is bounded. We then illustrate how to apply these results in a sample application: the analysis of importance weighting.
View details

Online learning with abstention

Scott Yang

Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden (2018)

Preview abstract
We present an extensive study of a key problem in online learning where the learner can opt to abstain from making a prediction, at a certain cost. In the adversarial setting, we show how existing online algorithms and guarantees can be adapted to this problem. In the stochastic setting, we first point out a bias problem that limits the straightforward extension of algorithms such as UCB-N to this context. Next, we give a new algorithm, UCB-GT, that exploits historical data and time-varying feedback graphs. We show that this algorithm benefits from more favorable regret guarantees than a natural extension of UCB-N. We further report the results of a series of experiments demonstrating that UCB-GT largely outperforms that extension of UCB-N, as well as other standard baselines.
View details

Logistic Regression: The Importance of Being Improper

Dylan Foster

Haipeng Luo

Karthik Sridharan

Conference on Learning Theory (2018)

Preview abstract
Learning linear predictors with the logistic loss—both in stochastic and online settings—is a fundamental task in machine learning and statistics, with direct connections to classification and boosting. Existing “fast rates” for this setting exhibit exponential dependence on the predictor norm, and Hazan et al. (2014) showed that this is unfortunately unimprovable. Starting with the simple observation that the logistic loss is 1-mixable, we design a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm. This provides a positive resolution to a variant of the COLT 2012 open problem of McMahan and Streeter (2012) when improper learning is allowed. This improvement is obtained both in the online setting and, with some extra work, in the batch statistical setting with high probability. We also show that the improved dependence on predictor norm is near-optimal.
Leveraging this improved dependency on the predictor norm yields the following applications: (a) we give algorithms for online bandit multiclass learning with the logistic loss with an O*(√n) relative mistake bound across essentially all parameter ranges, thus providing a solution to the COLT 2009 open problem of Abernethy and Rakhlin (2009), and (b) we give an adaptive algorithm for online multiclass boosting with optimal sample complexity, thus partially resolving an open problem of Beygelzimer et al. (2015) and Jung et al. (2017). Finally, we give information-theoretic bounds on the optimal rates for improper logistic regression with general function classes, thereby characterizing the extent to which our improvement for linear classes extends to other parameteric and even nonparametric settings.
View details

AdaNet: Adaptive structural learning of artificial neural networks

Vitaly Kuznetsov

Scott Yang

Proceedings of the 34th International Conference on Machine Learning (ICML 2017). Sydney, Australia, August 2017. (2017)

Preview abstract
We present new algorithms for adaptively learning
artificial neural networks. Our algorithms
(ADANET) adaptively learn both the structure
of the network and its weights. They are
based on a solid theoretical analysis, including
data-dependent generalization guarantees that we
prove and discuss in detail. We report the results
of large-scale experiments with one of our
algorithms on several binary classification tasks
extracted from the CIFAR-10 dataset and on the
Criteo dataset. The results demonstrate that our
algorithm can automatically learn network structures
with very competitive performance accuracies
when compared with those achieved by neural
networks found by standard approaches.
View details

Parameter-Free Online Learning via Model Selection

Dylan Foster

Karthik Sridharan

Neural Information Processing Systems (2017)

Preview abstract
We introduce an efficient algorithmic framework for model selection in online learning, also known as parameter-free online learning. Departing from previous work, which has focused on highly structured function classes such as nested balls in Hilbert space, we propose a generic meta-algorithm framework that achieves online model selection oracle inequalities under minimal structural assumptions. We give the first computationally efficient parameter-free algorithms that work in arbitrary Banach spaces under mild smoothness assumptions; previous results applied only to Hilbert spaces. We further derive new oracle inequalities for matrix classes, non-nested convex sets, and
R^d with generic regularizers. Finally, we generalize these results by providing oracle inequalities for arbitrary non-linear classes in the online supervised learning model. These results are all derived through a unified meta-algorithm scheme using a novel “multi-scale” algorithm for prediction with expert advice based on random playout, which may be of independent interest.
View details

Structured prediction theory based on factor graph complexity

Vitaly Kuznetsov

Scott Yang

Advances in Neural Information Processing Systems (NIPS 2016). Barcelona, Spain, 2016. MIT Press. (2016)

Preview abstract
We present a general theoretical analysis of structured prediction with a series
of new results. We give new data-dependent margin guarantees for structured
prediction for a very wide family of loss functions and a general family of hypotheses,
with an arbitrary factor graph decomposition. These are the tightest margin
bounds known for both standard multi-class and general structured prediction
problems. Our guarantees are expressed in terms of a data-dependent complexity
measure, factor graph complexity, which we show can be estimated from data and
bounded in terms of familiar quantities for several commonly used hypothesis sets
along with a sparsity measure for features and graphs. Our proof techniques include
generalizations of Talagrand’s contraction lemma that can be of independent
interest.
We further extend our theory by leveraging the principle of Voted Risk Minimization
(VRM) and show that learning is possible even with complex factor graphs. We
present new learning bounds for this advanced setting, which we use to design two
new algorithms, Voted Conditional Random Field (VCRF) and Voted Structured
Boosting (StructBoost). These algorithms can make use of complex features and
factor graphs and yet benefit from favorable learning guarantees. We also report
the results of experiments with VCRF on several datasets to validate our theory.
View details

Boosting with abstention

Advances in Neural Information Processing Systems (NIPS 2016). Barcelona, Spain, 2016. MIT Press. (2016)

Preview abstract
We present a new boosting algorithm for the key scenario of binary classification
with abstention where the algorithm can abstain from predicting the label of a point,
at the price of a fixed cost. At each round, our algorithm selects a pair of functions,
a base predictor and a base abstention function. We define convex upper bounds
for the natural loss function associated to this problem, which we prove to be
calibrated with respect to the Bayes solution. Our algorithm benefits from general
margin-based learning guarantees which we derive for ensembles of pairs of base
predictor and abstention functions, in terms of the Rademacher complexities of the
corresponding function classes. We give convergence guarantees for our algorithm
along with a linear-time weak-learning algorithm for abstention stumps. We also
report the results of several experiments suggesting that our algorithm provides a
significant improvement in practice over two confidence-based algorithms.
View details

Random Composite Forests

Association for the Advancement of Artificial Intelligence (2016) (to appear)

Preview abstract
We introduce a broad family of decision trees, Composite Trees, whose leaf classifiers are selected out of a hypothesis set composed of p sub-families with different complexities. We prove new data dependent learning guarantees for this family in the multi-class setting.These learning bounds provide a quantitative guidance for the choice of the hypotheses at each leaf. Remarkably, they depend on the Rademacher complexities of the sub-families of the predictors and the fraction of sample points correctly classified at each leaf. We further introduce random composite trees and derive learning guarantees for random composite trees which also apply to Random Forests. Using our theoretical analysis, we devise a new algorithm, RCF, that is based on forming an ensemble of random composite trees. We report the results of experiments demonstrating that RCF yields significant performance improvements over both Random Forests and a variant of RCF in several tasks
View details

Learning with rejection

Proceedings of The 27th International Conference on Algorithmic Learning Theory (ALT 2016). pages 67-82, Bari, Italy, 2016. Springer, Heidelberg, Germany. (2016)

Preview abstract
We introduce a novel framework for classification with a rejection
option that consists of simultaneously learning two functions:
a classifier along with a rejection function. We present a full theoretical
analysis of this framework including new data-dependent learning
bounds in terms of the Rademacher complexities of the classifier and
rejection families as well as consistency and calibration results. These
theoretical guarantees guide us in designing new algorithms that can
exploit different kernel-based hypothesis sets for the classifier and rejection
functions. We compare and contrast our general framework with the
special case of confidence-based rejection for which we devise alternative
loss functions and algorithms as well. We report the results of several experiments
showing that our kernel-based algorithms can yield a notable
improvement over the best existing confidence-based rejection algorithm.
View details

Learning with Deep Cascades

Proceedings of the Twenty-Sixth International Conference on Algorithmic Learning Theory (ALT 2015)

Preview abstract
We introduce a broad learning model formed by cascades of predictors, Deep Cascades, that is structured as general decision trees in which leaf predictors or node questions may be members of rich function families. We present new detailed data-dependent theoretical guarantees for learning with Deep Cascades with complex leaf predictors or node question in terms of the Rademacher complexities of the sub-families composing these sets of predictors and the fraction of sample points correctly classified at each leaf. These general guarantees can guide the design of a variety of different algorithms for deep cascade models and we give a detailed description of two such algorithms. Our second algorithm uses as node and leaf classifiers SVM predictors and we report the results of experiments comparing its performance with that of SVM combined with polynomial kernels.
View details

On-line learning algorithms for path experts with non-additive losses

Preview
Vitaly Kuznetsov

Manfred K. Warmuth

Proceedings of The 28th Annual Conference on Learning Theory (COLT 2015)

Structural maxent models

Vitaly Kuznetsov

Proceedings of the Thirty-Second International Conference on Machine Learning (ICML 2015)

Preview abstract
We present a new class of density estimation
models, Structural Maxent models, with feature
functions selected from a union of possibly
very complex sub-families and yet benefiting
from strong learning guarantees. The design
of our models is based on a new principle
supported by uniform convergence bounds and
taking into consideration the complexity of the
different sub-families composing the full set of
features. We prove new data-dependent learning
bounds for our models, expressed in terms of the
Rademacher complexities of these sub-families.
We also prove a duality theorem, which we use
to derive our Structural Maxent algorithm. We
give a full description of our algorithm, including
the details of its derivation, and report the results
of several experiments demonstrating that its performance
improves on that of existing L1-norm
regularized Maxent algorithms. We further similarly
define conditional Structural Maxent models
for multi-class classification problems. These
are conditional probability models also making
use of a union of possibly complex feature subfamilies.
We prove a duality theorem for these
models as well, which reveals their connection
with existing binary and multi-class deep boosting
algorithms.
View details

Adaptation algorithm and theory based on generalized discrepancy

Preview
Proceedings of the 21st ACM Conference on Knowledge Discovery and Data Mining (KDD 2015)

Deep boosting

Proceedings of the Thirty-First International Conference on Machine Learning (ICML 2014)

Preview abstract
We present a new ensemble learning algorithm, DeepBoost, which can use as base classifiers a hypothesis set containing deep decision trees, or members of other rich or complex families, and succeed in achieving high accuracy without overfitting the data. The key to the success of the algorithm is a capacity-conscious criterion for the selection of the hypotheses. We give new data- dependent learning bounds for convex ensembles expressed in terms of the Rademacher complexities of the sub-families composing the base classifier set, and the mixture weight assigned to each sub-family. Our algorithm directly benefits from these guarantees since it seeks to minimize the corresponding learning bound. We give a full description of our algorithm, including the details of its derivation, and report the results of several experiments showing that its performance compares favorably to that of AdaBoost and Logistic Regression and their L1-regularized variants.
View details

Learning ensembles of structured prediction rules

Preview
Vitaly Kuznetsov

Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014)

Ensemble methods for structured prediction

Preview
Vitaly Kuznetsov

Proceedings of the 31st International Conference on Machine Learning (ICML 2014)

Theoretical Foundations for Learning Kernels in Supervised Kernel PCA

Modern Nonparametrics 3: Automating the Learning Pipeline, Neural Information Processing Systems, Workshop (2014)

Preview abstract
This paper presents a novel learning scenario which combines dimensionality reduction, supervised learning as well as kernel selection. We carefully define the hypothesis class that addresses this setting and provide an analysis of its Rademacher complexity and thereby provide generalization guarantees. The proposed algorithm uses KPCA to reduce the dimensionality of the feature space, i.e. by projecting data onto top eigenvectors of covariance operator in a kernel reproducing space. Moreover, it simultaneously learns a linear combination of base kernel functions, which defines a reproducing space, as well as the parameters of a supervised learning algorithm in order to minimize a regularized empirical loss. The bound on Rademacher complexity of our hypothesis is shown to be logarithmic in the number of base kernels, which encourages practitioners to combine as
many base kernels as possible.
View details

Corporate learning at scale: Lessons from a large online course at Google

Arthur Asuncion

Jac de Haan

Kayur Patel

Lauren Wong

Learning at Scale (2014)

Preview abstract
Google Research recently tested a massive online class model for an internal engineering education program, with machine learning as the topic, that blended theoretical concepts and Google-specific software tool tutorials. The goal of this training was to foster engineering capacity to leverage machine learning tools in future products. The course was delivered both synchronously and asynchronously, and students had the choice between studying independently or participating with a group. Since all students are company employees, unlike most publicly offered MOOCs we can continue to measure the students’ behavioral change long after the course is complete. This paper describes the course, outlines the available data set and presents directions for analysis.
View details

Learning kernels using local rademacher complexity

Marius Kloft

Advances in Neural Information Processing Systems (NIPS 2013), MIT Press.

Preview abstract
We use the notion of local Rademacher complexity to design new algorithms for learning kernels. Our algorithms thereby benefit from the sharper learning bounds based on that notion which, under certain general conditions, guarantee a faster convergence rate. We devise two new learning kernel algorithms: one based on a convex optimization problem for which we give an efficient solution using existing learning kernel techniques, and another one that can be formulated as a DC-programming problem for which we describe a solution in detail. We also re- port the results of experiments with both algorithms in both binary and multi-class classification tasks.
View details

Multi-class classification with maximum margin multiple kernel

Preview
Proceedings of the Thirtieth International Conference on Machine Learning (ICML 2013)

Large Scale SVD and Manifold Learning

Preview
Ameet Talwalkar

Journal of Machine Learning Research (JMLR) (2013)

Large-scale SVD and manifold learning

Preview
Ameet Talwalkar

Journal of Machine Learning Research, vol. 14 (2013), pp. 3129-3152

Algorithms for Learning Kernels Based on Centered Alignment

Preview
Journal of Machine Learning Research, vol. 13 (2012), pp. 795-828

Accuracy at the Top

Stephen Boyd

NIPS: Neural Information Processing Systems Foundation (2012)

Preview abstract
We introduce a new notion of classiﬁcation accuracy based on the top τ -quantile
values of a scoring function, a relevant criterion in a number of problems arising
for search engines. We deﬁne an algorithm optimizing a convex surrogate of the
corresponding loss, and show how its solution can be obtained by solving a set
of convex optimization problems. We also present margin-based guarantees for
this algorithm based on the top τ -quantile of the scores of the functions in the
hypothesis set. Finally, we report the results of several experiments in the bipartite setting evaluating the performance of our algorithm and comparing the results to several other algorithms seeking high precision at the top. In most examples, our algorithm achieves a better performance in precision at the top.
View details

A Dual Coordinate Descent Algorithm for SVMs Combined with Rational Kernels

Preview
International Journal of Foundations of Computer Science, vol. 22 (2011), pp. 1761-1779

Ensembles of Kernel Predictors

Preview
Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011)

General Algorithms for Testing the Ambiguity of Finite Automata and the Double-Tape Ambiguity of Finite-State Transducers

Preview
Ashish Rastogi

International Journal of Foundations of Computer Science, vol. 22 (2011), pp. 883-904

Domain adaptation in regression

Preview
Proceedings of The 22nd International Conference on Algorithmic Learning Theory, ALT 2011, Springer, Heidelberg, Germany

Can matrix coherence be efficiently and accurately estimated?

Preview
Ameet Talwalkar

Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2011)

Ensemble Nystrom

Preview
Ameet Talwalkar

A book chapter in Ensemble Machine Learning: Theory and Applications, Springer (2011)

Two-Stage Learning Kernel Algorithms

Proceedings of the 27th Annual International Conference on Machine Learning (ICML 2010)

Preview abstract
This paper examines two-stage techniques for
learning kernels based on a notion of alignment.
It presents a number of novel theoretical, algorithmic,
and empirical results for alignmentbased
techniques. Our results build on previous
work by Cristianini et al. (2001), but we adopt
a different definition of kernel alignment and
significantly extend that work in several directions:
we give a novel and simple concentration
bound for alignment between kernel matrices;
show the existence of good predictors for kernels
with high alignment, both for classification
and for regression; give algorithms for learning a
maximum alignment kernel by showing that the
problem can be reduced to a simple QP; and report
the results of extensive experimentswith this
alignment-based method in classification and regression
tasks, which show an improvement both
over the uniformcombination of kernels and over
other state-of-the-art learning kernel methods.
View details

Half Transductive Ranking

Bing Bai

Jason Weston

David Grangier

Ronan Collobert

Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010)

Preview abstract
We study the standard retrieval task of ranking
a fixed set of items given a previously unseen
query and pose it as the half transductive
ranking problem. The task is transductive
as the set of items is fixed. Transductive
representations (where the vector representation
of each example is learned) allow
the generation of highly nonlinear embeddings
that capture object relationships without
relying on a specific choice of features,
and require only relatively simple optimization.
Unfortunately, they have no direct outof-
sample extension. Inductive approaches
on the other hand allow for the representation
of unknown queries. We describe algorithms
for this setting which have the advantages
of both transductive and inductive approaches,
and can be applied in unsupervised
(either reconstruction-based or graph-based)
and supervised ranking setups. We show empirically
that our methods give strong performance
on all three tasks.
View details

Discriminative Topic Segmentation of Text and Speech

Preview
International Conference on Artificial Intelligence and Statistics (AISTATS) (2010)

Learning Bounds for Importance Weighting

Preview
Yishay Mansour

Advances in Neural Information Processing Systems (NIPS 2010), MIT Press, Vancouver, Canada

Stability Bounds for Stationary $\phi$-mixing and $\beta$-mixing Processes

Journal of Machine Learning Research (JMLR), vol. 11 (2010), pp. 798-814

Preview abstract
Most generalization bounds in learning theory are based on some measure of the complexity of the hypothesis class used, independently of any algorithm. In contrast, the notion of algorithmic stability can be used to derive tight generalization bounds that are tailored to specific learning algorithms by exploiting their particular properties. However, as in much of learning theory, existing stability analyses and bounds apply only in the scenario where the samples are independently and identically distributed. In many machine learning applications, however, this assumption does not hold. The observations received by the learning algorithm often have some inherent temporal dependence.
View details

Generalization Bounds for Learning Kernels

Proceedings of the 27th Annual International Conference on Machine Learning (ICML 2010)

Preview abstract
This paper presents several novel generalization
bounds for the problem of learning kernels based
on a combinatorial analysis of the Rademacher
complexity of the corresponding hypothesis sets.
Our bound for learning kernels with a convex
combination of p base kernels using L1 regularization
admits only a √log p dependency on the
number of kernels, which is tight and considerably
more favorable than the previous best bound
given for the same problem. We also give a novel
bound for learning with a non-negative combination
of p base kernels with an L2 regularization
whose dependency on p is also tight and only in
p^(1/4). We present similar results for Lq regularization
with other values of q, and outline the relevance
of our proof techniques to the analysis of
the complexity of the class of linear functions.
Experiments with a large number of kernels further
validate the behavior of the generalization
error as a function of p predicted by our bounds.
View details

Algorithms for Learning Kernels Based on Centered Alignment

Preview
Journal of Machine Learning Research, vol. 13 (2010), pp. 795-828

On the Impact of Kernel Approximation on Learning Accuracy

Ameet Talwalkar

Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010)

Preview abstract
Kernel approximation is commonly used to scale
kernel-based algorithms to applications containing
as many as several million instances. This
paper analyzes the effect of such approximations
in the kernel matrix on the hypothesis generated
by several widely used learning algorithms. We
give stability bounds based on the norm of the
kernel approximation for these algorithms, including
SVMs, KRR, and graph Laplacian-based
regularization algorithms. These bounds help determine
the degree of approximation that can be
tolerated in the estimation of the kernel matrix.
Our analysis is general and applies to arbitrary
approximations of the kernel matrix. However,
we also give a specific analysis of the Nystr¨om
low-rank approximation in this context and report
the results of experiments evaluating the
quality of the Nystr¨om low-rank kernel approximation
when used with ridge regression.
View details

On Sampling-Based Approximate Spectral Decomposition

Preview
Ameet Talkwalkar

International Conference on Machine Learning (ICML) (2009)

Gaussian Margin Machines

Preview
Koby Crammer

Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS 2009), Clearwater Beach, Florida, pp. 105-112

N-Way Composition of Weighted Finite-State Transducers

Preview
International Journal of Foundations of Computer Science, vol. 20 (2009), pp. 613-627

Rademacher Complexity Bounds for Non-I.I.D. Processes

Preview
Advances in Neural Information Processing Systems (NIPS 2008), MIT Press, Vancouver, Canada (2009)

Weighted Automata Algorithms

Preview
Handbook of weighted automata, Springer (to appear) (2009)

Multiple Source Adaptation and the Renyi Divergence

Preview
Yishay Mansour

Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009), Montr\'eal, Canada

L2 Regularization for Learning Kernels

Preview
Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009), Montr\'eal, Canada

Domain Adaptation with Multiple Sources

Preview
Yishay Mansour

Advances in Neural Information Processing Systems (NIPS 2008), MIT Press, Vancouver, Canada (2009)

Learning non-linear combinations of kernels

Preview
NIPS 2009, Advances in Neural Information Processing Systems, MIT Press

Efficient and Robust Music Identification with Weighted Finite-State Transducers

Preview
IEEE Transactions on Audio, Speech, and Language Processing, vol. to appear (2009)

Domain Adaptation: Learning Bounds and Algorithms

Preview
Yishay Mansour

Proceedings of The 22nd Annual Conference on Learning Theory (COLT 2009), Omnipress, Montr\'eal, Canada

Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models

Preview
Gideon Mann

Ryan McDonald

Nathan Silberman

Daniel Walker IV

Neural Information Processing Systems (NIPS) (2009)

A new quality measure for topic segmentation of text and speech

Preview
Conference of the International Speech Communication Association (Interspeech) (2009)

Polynomial semantic indexing

Preview
Bing Bai

Jason Weston

David Grangier

Ronan Collobert

Kunihiko Sadamasa

Yanjun Qi

Advances in Neural Information Processing Systems (NIPS 2009), MIT Press

Learning sequence kernels

Preview
Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (2008)

A Machine Learning Framework for Spoken-Dialog Classification

Preview
Patrick Haffner

Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2008)

Kernel Methods for Learning Languages

Preview
Leonid Kontorovich

Theoretical Computer Science, vol. 405 (2008), pp. 223-236

Linear-Space Computation of the Edit-Distance between a String and a Finite Automaton

Preview
London Algorithmics 2008: Theory and Practice, College Publications (to appear)

Sample Selection Bias Correction Theory

Preview
Proceedings of The 19th International Conference on Algorithmic Learning Theory (ALT 2008), Springer, Heidelberg, Germany, Budapest, Hungary

General Algorithms for Testing the Ambiguity of Finite Automata

Preview
Ashish Rastogi

Proceedings of Twelfth International Conference Developments in Language Theory (DLT 2008), Springer, Heidelberg, Germany, Kyoto, Japan

Learning with weighted transducers

Preview
Proceedings of the Seventh International Workshop Finite-State Methods and Natural Language Processing (2008)

3-Way Composition of Weighted Finite-State Transducers

Preview
Proceedings of the 13th International Conference on Implementation and Application of Automata (CIAA 2008), Springer-Verlag, Heidelberg, Germany, San Francisco, California, pp. 262-273

Speech Recognition with Weighted Finite-State Transducers

Preview
Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2008)

On the Computation of the Relative Entropy of Probabilistic Automata

Preview
Ashish Rastogi

International Journal of Foundations of Computer Science, vol. 19 (2008), pp. 219-242

An Efficient Reduction of Ranking to Classification

Preview
Nir Ailon

Proceedings of The 21st Annual Conference on Learning Theory (COLT 2008), Springer, Heidelberg, Germany, Helsinki, Finland

Stability of Transductive Regression Algorithms

Preview
Dmitry Pechyony

Ashish Rastogi

Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML 2008), Helsinki, Finland

Stability Bounds for Non-i.i.d. Processes

Preview
Advances in Neural Information Processing Systems (NIPS 2007), MIT Press, Vancouver, Canada (2008)

OpenFst: a General and Efficient Weighted Finite-State Transducer Library

Preview
Johan Schalkwyk

Wojciech Skut

Proceedings of the 12th International Conference on Implementation and Application of Automata (CIAA 2007), Springer-Verlag, Heidelberg, Germany, Prague, Czech Republic

Lp Distance and Equivalence of Probabilistic Automata

Preview
Ashish Rastogi

International Journal of Foundations of Computer Science, vol. 18 (2007)

On Transductive Regression

Preview
Advances in Neural Information Processing Systems (NIPS 2006), MIT Press, Vancouver, Canada (2007)

Speech Recognition with Weighted Finite-State Transducers

Preview
Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2007)

Learning Languages with Rational Kernels

Preview
Leonid Kontorovich

Proceedings of The 20th Annual Conference on Computational Learning Theory (COLT 2007), Springer, Heidelberg, Germany, San Diego, California

Lp Distance and Equivalence of Probabilistic Automata

Preview
Ashish Rastogi

International Journal of Foundations of Computer Science, vol. 18 (2007), pp. 761-780

An Alternative Ranking Problem for Search Engines

Preview
Ashish Rastogi

Proceedings of the 6th Workshop on Experimental Algorithms (WEA 2007), Springer-Verlag, Heidelberg, Germany, Rome, Italy, pp. 1-21

L_p Distance and Equivalence of Probabilistic Automata

Preview
Ashish Rastogi

International Journal of Foundations of Computer Science, vol. to appear (2007)

Robust music identification, detection, and analysis

Preview
Proceedings of the International Conference on Music Information Retrieval (ISMIR) (2007)

On the Computation of the Relative Entropy of Probabilistic Automata

Preview
Ashish Rastogi

International Journal of Foundations of Computer Science, vol. to appear (2007)

A Machine Learning Framework for Spoken-Dialog Classification

Preview
Patrick Haffner

Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2007)

Magnitude-Preserving Ranking Algorithms

Preview
Ashish Rastogi

Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML 2007), Oregon State University, Corvallis, OR

Factor Automata of Automata and Applications

Preview
Proceedings of the 12th International Conference on Implementation and Application of Automata (CIAA2007), July, CIAA 2007Proceedings of the 12th International Conference on Implementation and Application of Automata (CIAA2007), Prague, Czech Republic.

Learning Linearly Separable Languages

Preview
Leonid Kontorovich

Proceedings of The 17th International Conference on Algorithmic Learning Theory (ALT 2006), Springer, Heidelberg, Germany

Probabilistic Context-Free Grammar Induction Based on Structural Zeros

Preview
Proceedings of the Seventh Meeting of the Human Language Technology conference - North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2006), New York, NY

On a Common Fallacy in Computational Linguistics

Preview
A Man of Measure: Festschrift in Honour of Fred Karlsson on this 60th Birthday, SKY Journal of Linguistics, Volume 19 (2006), pp. 432-439

Efficient Computation of the Relative Entropy of Probabilistic Automata

Preview
Ashish Rastogi

Proceedings of the 7th Latin American Symposium (LATIN 2006), Springer-Verlag, Heidelberg, Germany, Valdivia, Chile

A Unified Construction of the Glushkov, Follow, and Antimirov Automata

Preview
Proceedings of the 31st International Symposium on Mathematical Foundations of Computer Science (MFCS 2006), Springer-Verlag, Heidelberg, Germany, Star\'a Lesn\'a, Slovakia, pp. 110-121

On the Computation of Some Standard Distances between Probabilistic Automata

Preview
Ashish Rastogi

Proceedings of the 11th International Conference on Implementation and Application of Automata (CIAA 2006), Springer-Verlag, Heidelberg, Germany, Taipei, Taiwan

The Design Principles and Algorithms of a Weighted Grammar Library

Preview
International Journal of Foundations of Computer Science, vol. 16 (2005)

A General Regression Technique for Learning Transductions

Preview
Jason Weston

Proceedings of the Twenty-Second International Conference on Machine Learning (ICML 2005), Bonn, Germany

Multi-Armed Bandit Algorithms and Empirical Evaluation

Preview
Joannès Vermorel

Proceedings of the 16th European Conference on Machine Learning (ECML 2005), Springer, Heidelberg, Germany, Porto, Portugal

Margin-Based Ranking Meets Boosting in the Middle

Preview
Cynthia Rudin

Robert E. Schapire

Proceedings of The 18th Annual Conference on Computational Learning Theory (COLT 2005), Springer, Heidelberg, Germany, Bertinoro, Italy, pp. 63-78

A Comparison of Classifiers for Detecting Emotion from Speech

Preview
Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), Philadelphia, Pennsylvania

Finite-State Transducers in Computational Biology

Preview
Tutorial presented at the 13th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2005), Detroit, MI

Statistical Natural Language Processing

Preview
Applied Combinatorics on Words, Cambridge University Press (2005)

Local Grammar Algorithms

Preview
Inquiries into Words, Constraints, and Contexts. Festschrift in Honour of Kimmo Koskenniemi on his 60th Birthday, CSLI Publications, Stanford University (2005), pp. 84-93

Margin-Based Ranking Meets Boosting in the Middle

Preview
Cynthia Rudin

Robert E. Schapire

Proc. of the 18th Annual Conference on Computational Learning Theory (COLT 2005), Springer, Heidelberg, Germany, pp. 63-78

Confidence Intervals for the Area under the ROC Curve

Preview
Advances in Neural Information Processing Systems (NIPS 2004), MIT Press, Vancouver, Canada (2005)

Distribution Kernels Based on Moments of Counts

Preview
Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004), Banff, Alberta, Canada

Rational Kernels: Theory and Algorithms

Preview
Patrick Haffner

Journal of Machine Learning Research (JMLR), vol. 5 (2004), pp. 1035-1062

On the Rademacher Complexity of Weighted Automata

Automata and Graph Compression

Learning Weighted Automata

Automata and graph compression

Non-parametric Revenue Optimization for Generalized Second Price Auctions

On the Disambiguation of Weighted Automata

Accelerating Optimization via Adaptive Prediction

Voted Kernel Regularization

Foundations of Coupled Nonlinear Dimensionality Reduction

Learning Theory and Algorithms for revenue optimization in second price auctions with reserve

Revenue Optimization in Posted-Price Auctions with Strategic Buyers

On the Disambiguation of Weighted Automata

Optimal Regret Minimization in Posted-Price Auctions with Strategic Buyers

Generalization Bounds for Time Series Prediction with Non-stationary Processes

Conditional Swap Regret and Conditional Correlated Equilibrium

On the Disambiguation of Finite Automata and Functional Transducers

Int. J. Found. Comput. Sci., vol. 24 (2013), pp. 847-862

Tight Lower Bound on the Probability of a Binomial Exceeding its Expectation

Perceptron Mistake Bounds

Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions

Learning Theory and Algorithms for Revenue Optimization in Second-Price Auctions with Reserve

Foundations of Machine Learning

Sampling Methods for the Nyström Method

Ameet Talwalkar

Journal of Machine Learning Research, vol. 13 (2012), pp. 981-1006

Stability Bounds for Stationary phi-mixing and beta-mixing Processes

Stability Analysis and Learning Bounds for Transductive Regression Algorithms

Multiple Source Adaptation and the Rényi Divergence

New Generalization Bounds for Learning Kernels

Stability Bound for Stationary Phi-mixing and Beta-mixing Processes

Rademacher Complexity Bounds for Non-I.I.D. Processes

Learning Languages with Rational Kernels

Magnitude-preserving ranking algorithms

An Alternative Ranking Problem for Search Engines

On the Computation of Some Standard Distances Between Probabilistic Automata

On Transductive Regression

Learning Linearly Separable Languages

Efficient Computation of the Relative Entropy of Probabilistic Automata

A Unified Construction of the Glushkov, Follow, and Antimirov Automata

Moment Kernels for Regular Distributions

Weighted Automata in Text and Speech Processing

Multi-armed Bandit Algorithms and Empirical Evaluation

A General Weighted Grammar Library

Ninth International Conference on Automata (CIAA 2004), Kingston, Canada, July 22-24, 2004, Springer-Verlag, Berlin-NY (2005)

A Comparison of Classifiers for Detecting Emotion from Speech

An Optimal Pre-Determinization Algorithm for Weighted Transducers

AUC Optimization vs. Error Rate Minimization

Advances in Neural Information Processing Systems (NIPS 2003), MIT Press, Vancouver, Canada (2004)

Weighted Finite-State Transducer Algorithms: An Overview

Formal Languages and Applications, Springer, Berlin (2004)

A General Weighted Grammar Library

Proceedings of the Ninth International Conference on Automata (CIAA 2004), Kingston, Ontario, Canada

Statistical Modeling for Unit Selection in Speech Synthesis

42nd Meeting of the Association for Computational Linguistics (ACL 2004), Proceedings of the Conference, Barcelona, Spain

Distribution kernels based on moments of counts

An optimal pre-determinization algorithm for weighted transducers

A General Weighted Grammar Library

General Indexation of Weighted Automata -- Application to Spoken Utterance Retrieval

Murat Saraclar

Proceedings of the annual meeting of the Human Language Technology conference and North American Chapter of the Association for Computational Linguistics (HLT/NAACL 2004), Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval, Boston, Massachusetts

Rational Kernels: Theory and Algorithms

Patrick Haffner

Journal of Machine Learning Research, vol. 5 (2004), pp. 1035-1062

Confidence Intervals for the Area Under the ROC Curve

A Generalized Construction of Integrated Speech Recognition Transducers

Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), Montreal, Canada

Statistical Modeling for Unit Selection in Speech Synthesis

$42$nd Meeting of the Association for Computational Linguistics (ACL 2004), Proceedings of the Conference, Barcelona, Spain

Positive Definite Rational Kernels

Patrick Haffner

Proceedings of The 16th Annual Conference on Computational Learning Theory (COLT 2003), Springer, Heidelberg, Germany, Washington D.C.

Weighted Automata Kernels -- General Framework and Algorithms

Patrick Haffner

Proceedings of the 9th European Conference on Speech Communication and Technology (Eurospeech '03), Special Session Advanced Machine Learning Algorithms for Speech and Language Processing, Geneva, Switzerland (2003)

Generalized Algorithms for Constructing Statistical Language Models

$41$st Meeting of the Association for Computational Linguistics (ACL 2003), Proceedings of the Conference, Sapporo, Japan

Rational Kernels

Patrick Haffner

Advances in Neural Information Processing Systems (NIPS 2002), MIT Press, Vancouver, Canada (2003)

Finitely Subsequential Transducers

International Journal of Foundations of Computer Science, vol. 14 (2003), pp. 983-994

Edit-Distance of Weighted Automata

Seventh International Conference on Automata (CIAA 2002), Tours, France, Springer, Berlin-NY (2003), pp. 1-23

Weighted automata kernels - general framework and algorithms

Generalized optimization algorithm for speech recognition transducers

Edit-Distance of Weighted Automata: General Definitions and Algorithms

International Journal of Foundations of Computer Science, vol. 14 (2003)

Efficient Algorithms for Testing the Twins Property

Finitely Subsequential Transducers

Voice Signatures

Proceedings of The 8th IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2003), St. Thomas, U.S. Virgin Islands

Lattice Kernels for Spoken-Dialog Classification

Patrick Haffner

Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), Hong Kong

Learning from Uncertain Data

COLT (2003), pp. 656-670

Generalized Optimization Algorithm for Speech Recognition Transducers

Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), Hong Kong

An Efficient Pre-determinization Algorithm

Generalized Algorithms for Constructing Statistical Language Models

Positive Definite Rational Kernels

$p$-Subsequentiable Transducers

Seventh International Conference on Automata (CIAA 2002), Tours, France, Springer, Berlin-NY (2003), pp. 24-34

Lattice kernels for spoken-dialog classification

Learning from Uncertain Data

Proceedings of The 16th Annual Conference on Computational Learning Theory (COLT 2003), Springer, Heidelberg, Germany, Washington D.C.

Edit-Distance Of Weighted Automata: General Definitions And Algorithms

Int. J. Found. Comput. Sci., vol. 14 (2003)

AUC Optimization vs. Error Rate Minimization

An Efficient Pre-Determinization Algorithm

Eighth International Conference on Automata (CIAA 2003), Santa Barbara, CA, Springer, Berlin-NY, pp. 83-95

A Comparison of Two LVR Search Optimization Techniques

Stephan Kanthak

Hermann Ney

Proceedings of the International Conference on Spoken Language Processing 2002 (ICSLP '02), Denver, Colorado

Weighted Finite-State Transducers in Speech Recognition

Computer Speech and Language, vol. 16 (2002), pp. 69-88

p-Subsequentiable Transducers

A comparison of two LVR search optimization techniques

Semiring Frameworks and Algorithms for Shortest-Distance Problems

Journal of Automata, Languages and Combinatorics, vol. 7 (2002)

$p$-Subsequentiable Transducers

Proceedings of the Seventh International Conference on Automata (CIAA 2002), Tours, France

An Efficient Algorithm for the N-Best-Strings Problem

Proceedings of the International Conference on Spoken Language Processing 2002 (ICSLP '02), Denver, Colorado

Weighted finite-state transducers in speech recognition

Generic Epsilon-Removal and Input Epsilon-Normalization Algorithms for Weighted Transducers

International Journal of Foundations of Computer Science, vol. 13 (2002), pp. 129-143

Edit-Distance of Weighted Automata

Proceedings of the Seventh International Conference on Automata (CIAA 2002), Tours, France

Rational Kernels

Weighted Finite-State Transducers in Speech Recognition (Tutorial)

On the Determinizability of Weighted Automata and Transducers

Proceedings of the workshop Weighted Automata: Theory and Applications (WATA), Dresden, Germany (2002)

Generic e-Removal and Input e-Normalization Algorithms for Weighted Transducers

Int. J. Found. Comput. Sci., vol. 13 (2002), pp. 129-143

Edit-Distance of Weighted Automata

CIAA (2002), pp. 1-23

Weighted Automata Algorithms (Tutorial)

An Efficient Algorithm for the $N$-Best-Strings Problem

Language Processing with Weighted Transducers

Proceedings of the 8th annual conference Traitement Automatique des Langues Naturelles (TALN 2001), Tours, France

Generic Epsilon-Removal Algorithm for Weighted Automata

5th International Conference on Automata (CIAA 2000), London Ontario, Canada, Springer-Verlag, Berlin-NY (2001), pp. 230-242

A weight pushing algorithm for large vocabulary speech recognition

Weighted Grammar Tools: the GRM Library

Robustness in Language and Speech Technology, Kluwer Academic Publishers, The Netherlands (2001), pp. 165-186

Regular Approximation of Context-Free Grammars through Transformation

Mark-Jan Nederhof

Robustness in Language and Speech Technology, Kluwer Academic Publishers, The Netherlands (2001), pp. 153-163

A Weight Pushing Algorithm for Large Vocabulary Speech Recognition

Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech '01), Aalborg, Denmark (2001)

Generic epsilon -Removal Algorithm for Weighted Automata

CIAA (2000), pp. 230-242

Minimization Algorithms for Sequential Transducers

Theoretical Computer Science, vol. 234 (2000), pp. 177-201

Context-Free Recognition with Weighted Automata

The Design Principles of a Weighted Finite-State Transducer Library

Theoretical Computer Science, vol. 231 (2000), pp. 17-32

Minimization algorithms for sequential transducers

Theor. Comput. Sci., vol. 234 (2000), pp. 177-201

Generic Epsilon-Removal Algorithm for Weighted Automata

Proceedings of the Fifth International Conference on Automata (CIAA 2000), London, Ontario, Canada

The Design Principles of a Weighted Finite-State Transducer Library

Weighted Finite-State Transducers in Speech Recognition

Proceedings of the ISCA Tutorial and Research Workshop, Automatic Speech Recognition: Challenges for the new Millenium (ASR2000), Paris, France

Rapid unit selection from a large speech corpus for concatenative speech synthesis

Network Optimizations for Large Vocabulary Speech Recognition

Comments on Jelinek, Language modeling for speech recognition, by Frederick Jelinek

Extended Finite State Models of Language, Cambridge University Press, Cambridge (1999)

Network optimizations for large-vocabulary speech recognition

Integrated Context-Dependent Networks in Very Large Vocabulary Speech Recognition

Proceedings of the 6th European Conference on Speech Communication and Technology (Eurospeech '99), Budapest, Hungary (1999)

Rapid Unit Selection from a Large Speech Corpus for Concatenative Speech Synthesis

Mark Beutnagel

Proceedings of the 6th European Conference on Speech Communication and Technology (Eurospeech '99), Budapest, Hungary (1999)

Integrated context-dependent networks in very large vocabulary speech recognition

Context-Free Recognition with Weighted Automata

Proceedings of the Sixth Meeting on Mathematics of Language (MOL6), Orlando, Florida (1999)

Full expansion of context-dependent networks in large vocabulary speech recognition

Donald Hindle

Andrej Ljolje

Fernando C. N. Pereira

ICASSP (1998), pp. 665-668

Dynamic Compilation of Weighted Context-Free Grammars

Dynamic Compilation of Weighted Context-Free Grammars

36th Meeting of the Association for Computational Linguistics (ACL '98), Proceedings of the Conference, Montréal, Québec, Canada (1998), pp. 891-897

General Algebraic Frameworks and Algorithms for Shortest-Distance Problems

AT&T Labs - Research, 62 pages (1998)

A Rational Design for a Weighted Finite-State Transducer Library

Proceedings of the Second International Workshop on Implementing Automata (WIA '97), Springer-Verlag, Berlin-NY (1998), pp. 144-158

Speech Processing

Graduate course, Columbia University, Department of Computer Science, New York, NY, 515 pages (1998)

VPQ: a spoken language interface to large scale directory information

Bruce Buntschuh

Candace A. Kamm

Giuseppe Di Fabbrizio

Alicia Abella

Shrikanth Narayanan

Ilija Zeljkovic

R. D. Sharp

Jeremy H. Wright

S. Marcus

J. Shaffer

R. Duncan

Jay G. Wilpon

ICSLP (1998)

Full Expansion of Context-Dependent Networks in Large Vocabulary Speech Recognition

Don Hindle

Andrej Ljolje

Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '98), Seattle, Washington (1998)

Transducer Composition for Context-Dependent Network Expansion

EuroSpeech'97, European Speech Communication Association, Genova, Italy (1997), pp. 1427-1430

String-Matching with Automata

Nordic Journal of Computing, vol. 4 (1997), pp. 217-231

On The Use of Sequential Transducers in Natural Language Processing

Finite-State Language Processing, The MIT Press, Cambridge, Massachusetts (1997)

Finite-State Transducers in Language and Speech Processing

Computational Linguistics, vol. 23 (1997), pp. 269-311

A Rational Design for a Weighted Finite-State Transducer Library

WIA'97: Proceedings of the Workshop on Implementing Automata, Springer-Verlag (1997)

Weighted Determinization and Minimization for Large Vocabulary Speech Recognition

Proceedings of the 5th European Conference on Speech Communication and Technology (Eurospeech '97), Rhodes, Greece (1997)

Transducer Composition for Context-Dependent Network Expansion

String-Matching with Automata

Nord. J. Comput., vol. 4 (1997), pp. 217-231

Weighted determinization and minimization for large vocabulary speech recognition

A Rational Design for a Weighted Finite-State Transducer Library

Proceedings of the Workshop on Implementing Automata (WIA '97), London, Ontario, Canada, University of Western Ontario, London, Ontario, Canada (1997)

An Efficient Compiler for Weighted Rewrite Rules

An Efficient Compiler for Weighted Rewrite Rules

Finite-State Transducers in Language and Speech Processing

Tutorial at the 16th International Conference on Computational Linguistics (COLING-96), COLING, Copenhagen, Denmark (1996)

On some Applications of Finite-State Automata Theory to Natural Language Processing

Journal of Natural Language Engineering, vol. 2 (1996), pp. 1-20

On some applications of finite-state automata theory to natural language processing

Natural Language Engineering, vol. 2 (1996), pp. 61-80

Weighted Automata in Text and Speech Processing

Proceedings of the 12th biennial European Conference on Artificial Intelligence (ECAI-96), Workshop on Extended finite state models of language, John Wiley and Sons, Chichester, Budapest, Hungary (1996)

An Efficient Compiler for Weighted Rewrite Rules

$34$th Meeting of the Association for Computational Linguistics (ACL '96), Proceedings of the Conference, Santa Cruz, California, Santa Cruz, California (1996)

Algorithms for Speech Recognition and Language Processing

Rational Power Series in Text and Speech Processing

Graduate course, University of Pennsylvania, Department of Computer Science, Philadelphia, PA (1996)

Computation of French Temporal Expressions to Query Databases

Computation of French Temporal Expressions to Query Databases

Denis Maurel

The First Workshop on the Applications of Natural language Processing to Databases, FWANLPD, Versailles, France (1995)

Matching Patterns of An Automaton

CPM (1995), pp. 286-297

Matching Patterns of an Automaton

Proceedings of the 6th Annual Symposium on Combinatorial Pattern Matching (CPM '95), Springer-Verlag, Berlin-NY, Espoo, Finland (1995), pp. 286-297

Minimization of Sequential Transducers

Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching (CPM '94), Springer-Verlag, Berlin-NY, Asilomar, California (1994), pp. 151-163

French Temporal Expressions: Recognition, Parsing and Real Computation

Denis Maurel

Proceedings of the 10th Annual Conference of the UW Centre for the New Oxford English Dictionary and Text Research, Waterloo, Ontario, Canada, University of Waterloo (1994)

Combinaisons appropriées dans les constructions complétives

Langages, Larousse: Paris, vol. 115 (1994)

Review of Les Nouvelles Syntaxes, Grammaires d'unification et analyse du français by Anne Abeillé, 1993, Armand Colin, Paris, France

Lingvisticae Investigationes, vol. 18 (1994), pp. 415-418

On some Applications of Finite-State Automata Theory to Natural Language Processing: Representation of Morphological Dictionaries, Compaction, and Indexation

Universit\'e Marne-la-Vall\'ee, Institut Gaspard Monge, Noisy-le-Grand (1994)

Syntactic Analysis by Local Grammars Automata: an Efficient Algorithm

CoRR, vol. abs/cmp-lg/9407002 (1994)

Compact Representations by Finite-State Transducers

ACL (1994), pp. 204-209

Reprise par une relative

Proceedings of the International Conference Dépendance et intégration syntaxique, Bordeaux, France, Niemeyer (1994)

Syntactic Analysis by Local Grammars Automata: an Efficient Algorithm

Proceedings of the International Conference on Computational Lexicography (COMPLEX 94), Linguistic Institute, Hungarian Academy of Science: Budapest, Hungary (1994)

Minimization of Sequential Transducers

CPM (1994), pp. 151-163

Compact Representations by Finite-State Transducers

32nd Meeting of the Association for Computational Linguistics (ACL '94), Proceedings of the Conference, Las Cruces, New Mexico (1994), pp. 204-209

Réduction De Complétive À Un Nom Et Article Défini Générique

Lingvisticae Investigationes, vol. 17 (1993), pp. 83-97

Analyse et représentation par automates de structures syntaxiques composées: Application aux complétives (Thesis Abstract)

Lingvisticae Investigationes, vol. 17 (1993), pp. 431-432

La coréférence et l'aspect

Lingvisticae Investigationes, vol. 14 (1990), pp. 403-412