# Corinna Cortes

Corinna Cortes is a VP in Google Research, where she is working on a broad range of theoretical and applied large-scale machine learning problems. Prior to Google, Corinna spent more than ten years at AT&T Labs - Research, formerly AT&T Bell Labs, where she held a distinguished research position. Corinna's research work is well-known in particular for her contributions to the theoretical foundations of support vector machines (SVMs), for which she jointly with Vladimir Vapnik received the 2008 Paris Kanellakis Theory and Practice Award, and her work on data-mining in very large data sets for which she was awarded the AT&T Science and Technology Medal in the year 2000. Corinna received her MS degree in Physics from University of Copenhagen and joined AT&T Bell Labs as a researcher in 1989. She received her Ph.D. in computer science from the University of Rochester in 1993.

Corinna is also a competitive and a mother of two.

Authored Publications

Google Publications

Other Publications

Sort By

Understanding the Effects of Batching in Online Active Learning

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (2020)

Preview abstract
Online active learning (AL) algorithms often assume immediate access to a label once a query has been made. However, due to practical constraints, the labels of these queried examples are generally only available in ``batches''. In this work, we present a novel analysis for a generic class of batch online AL algorithms and reveal that the effects of batching are in fact mild and only result in an additional term in the label complexity that is linear in the batch size. To our knowledge, this provides the first theoretical justification for such algorithms and we show how they can be applied to batch variants of three canonical online AL algorithms: IWAL, ORIWAL, and DHM. We also conduct an empirical study that corroborates the novel theoretical insights.
View details

Relative deviation learning bounds and generalization with unbounded loss functions

Spencer Greenberg

Annals of Mathematics and Artificial Intelligence (2019)

Preview abstract
We present an extensive analysis of relative deviation bounds, including detailed proofs of two-sided inequalities and their implications. We also give detailed proofs of two-sided generalization bounds that hold in the general case of unbounded loss functions, under the assumption that a moment of the loss is bounded. We then illustrate how to apply these results in a sample application: the analysis of importance weighting.
View details

Adaptation Based on Generalized Discrepancy

Journal of Machine Learning Research, vol. 20 (2019), pp. 1-30

Preview abstract
We present a new algorithm for domain adaptation improving upon a discrepancy minimization algorithm, (DM), previously shown to outperform a number of algorithms for this problem. Unlike many previously proposed solutions for domain adaptation, our algorithm does not consist of a fixed reweighting of the losses over the training sample. Instead, the reweighting depends on the hypothesis sought. The algorithm is derived from a less conservative notion of discrepancy than the DM algorithm called generalized discrepancy. We present a detailed description of our algorithm and show that it can be formulated as a convex optimization problem. We also give a detailed theoretical analysis of its learning guarantees which helps us select its parameters. Finally, we report the results of experiments demonstrating that it improves upon discrepancy minimization.
View details

Learning GANs and Ensembles Using Discrepancy

Ningshan Zhang

Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019

Preview abstract
Generative adversarial networks (GANs) generate data based on minimizing a divergence between two distributions. The choice of that divergence is therefore critical. We argue that the divergence must take into account the hypothesis set and the loss function used in a subsequent learning task, where the data generated by a GAN serves for training. Taking that structural information into account is also important to derive generalization guarantees. Thus, we propose to use the discrepancy measure, which was originally introduced for the closely related problem of domain adaptation and which precisely takes into account the hypothesis set and the loss function. We show that discrepancy admits favorable properties for training GANs and prove explicit generalization guarantees. We present efficient algorithms using discrepancy for two tasks: training a GAN directly, namely DGAN, and mixing previously trained generative models, namely EDGAN. Our experiments on toy examples and several benchmark datasets show that DGAN is competitive with other GANs and that EDGAN outperforms existing GAN ensembles, such as AdaGAN.
View details

AdaNet: A Scalable and Flexible Framework for Automatically Learning Ensembles

Charles Weill

Vitaly Kuznetsov

Scott Yang

Scott Yak

Hanna Mazzawi

Eugen Hotaj

Ghassen Jerfel

Vladimir Macko

(2019)

Preview abstract
AdaNet is a lightweight TensorFlow-based (Abadi et al., 2015) framework for automatically learning high-quality ensembles with minimal expert intervention. Our framework is inspired by the AdaNet algorithm (Cortes et al., 2017) which learns the structure of a neural network as an ensemble of subnetworks. We designed it to: (1) integrate with the existing TensorFlow ecosystem, (2) offer sensible default search spaces to perform well on novel datasets, (3) present a flexible API to utilize expert information when available, and (4) efficiently accelerate training with distributed CPU, GPU, and TPU hardware. The code is open-source and available at https://github.com/tensorflow/adanet.
View details

Online learning with abstention

Scott Yang

Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden (2018)

Preview abstract
We present an extensive study of a key problem in online learning where the learner can opt to abstain from making a prediction, at a certain cost. In the adversarial setting, we show how existing online algorithms and guarantees can be adapted to this problem. In the stochastic setting, we first point out a bias problem that limits the straightforward extension of algorithms such as UCB-N to this context. Next, we give a new algorithm, UCB-GT, that exploits historical data and time-varying feedback graphs. We show that this algorithm benefits from more favorable regret guarantees than a natural extension of UCB-N. We further report the results of a series of experiments demonstrating that UCB-GT largely outperforms that extension of UCB-N, as well as other standard baselines.
View details

AdaNet: Adaptive structural learning of artificial neural networks

Vitaly Kuznetsov

Scott Yang

Proceedings of the 34th International Conference on Machine Learning (ICML 2017). Sydney, Australia, August 2017. (2017)

Preview abstract
We present new algorithms for adaptively learning
artificial neural networks. Our algorithms
(ADANET) adaptively learn both the structure
of the network and its weights. They are
based on a solid theoretical analysis, including
data-dependent generalization guarantees that we
prove and discuss in detail. We report the results
of large-scale experiments with one of our
algorithms on several binary classification tasks
extracted from the CIFAR-10 dataset and on the
Criteo dataset. The results demonstrate that our
algorithm can automatically learn network structures
with very competitive performance accuracies
when compared with those achieved by neural
networks found by standard approaches.
View details

Structured prediction theory based on factor graph complexity

Vitaly Kuznetsov

Scott Yang

Advances in Neural Information Processing Systems (NIPS 2016). Barcelona, Spain, 2016. MIT Press. (2016)

Preview abstract
We present a general theoretical analysis of structured prediction with a series
of new results. We give new data-dependent margin guarantees for structured
prediction for a very wide family of loss functions and a general family of hypotheses,
with an arbitrary factor graph decomposition. These are the tightest margin
bounds known for both standard multi-class and general structured prediction
problems. Our guarantees are expressed in terms of a data-dependent complexity
measure, factor graph complexity, which we show can be estimated from data and
bounded in terms of familiar quantities for several commonly used hypothesis sets
along with a sparsity measure for features and graphs. Our proof techniques include
generalizations of Talagrand’s contraction lemma that can be of independent
interest.
We further extend our theory by leveraging the principle of Voted Risk Minimization
(VRM) and show that learning is possible even with complex factor graphs. We
present new learning bounds for this advanced setting, which we use to design two
new algorithms, Voted Conditional Random Field (VCRF) and Voted Structured
Boosting (StructBoost). These algorithms can make use of complex features and
factor graphs and yet benefit from favorable learning guarantees. We also report
the results of experiments with VCRF on several datasets to validate our theory.
View details

Learning with rejection

Proceedings of The 27th International Conference on Algorithmic Learning Theory (ALT 2016). pages 67-82, Bari, Italy, 2016. Springer, Heidelberg, Germany. (2016)

Preview abstract
We introduce a novel framework for classification with a rejection
option that consists of simultaneously learning two functions:
a classifier along with a rejection function. We present a full theoretical
analysis of this framework including new data-dependent learning
bounds in terms of the Rademacher complexities of the classifier and
rejection families as well as consistency and calibration results. These
theoretical guarantees guide us in designing new algorithms that can
exploit different kernel-based hypothesis sets for the classifier and rejection
functions. We compare and contrast our general framework with the
special case of confidence-based rejection for which we devise alternative
loss functions and algorithms as well. We report the results of several experiments
showing that our kernel-based algorithms can yield a notable
improvement over the best existing confidence-based rejection algorithm.
View details

Boosting with abstention

Advances in Neural Information Processing Systems (NIPS 2016). Barcelona, Spain, 2016. MIT Press. (2016)

Preview abstract
We present a new boosting algorithm for the key scenario of binary classification
with abstention where the algorithm can abstain from predicting the label of a point,
at the price of a fixed cost. At each round, our algorithm selects a pair of functions,
a base predictor and a base abstention function. We define convex upper bounds
for the natural loss function associated to this problem, which we prove to be
calibrated with respect to the Bayes solution. Our algorithm benefits from general
margin-based learning guarantees which we derive for ensembles of pairs of base
predictor and abstention functions, in terms of the Rademacher complexities of the
corresponding function classes. We give convergence guarantees for our algorithm
along with a linear-time weak-learning algorithm for abstention stumps. We also
report the results of several experiments suggesting that our algorithm provides a
significant improvement in practice over two confidence-based algorithms.
View details

Structural maxent models

Vitaly Kuznetsov

Proceedings of the Thirty-Second International Conference on Machine Learning (ICML 2015)

Preview abstract
We present a new class of density estimation
models, Structural Maxent models, with feature
functions selected from a union of possibly
very complex sub-families and yet benefiting
from strong learning guarantees. The design
of our models is based on a new principle
supported by uniform convergence bounds and
taking into consideration the complexity of the
different sub-families composing the full set of
features. We prove new data-dependent learning
bounds for our models, expressed in terms of the
Rademacher complexities of these sub-families.
We also prove a duality theorem, which we use
to derive our Structural Maxent algorithm. We
give a full description of our algorithm, including
the details of its derivation, and report the results
of several experiments demonstrating that its performance
improves on that of existing L1-norm
regularized Maxent algorithms. We further similarly
define conditional Structural Maxent models
for multi-class classification problems. These
are conditional probability models also making
use of a union of possibly complex feature subfamilies.
We prove a duality theorem for these
models as well, which reveals their connection
with existing binary and multi-class deep boosting
algorithms.
View details

Adaptation algorithm and theory based on generalized discrepancy

Preview
Proceedings of the 21st ACM Conference on Knowledge Discovery and Data Mining (KDD 2015)

On-line learning algorithms for path experts with non-additive losses

Preview
Vitaly Kuznetsov

Manfred K. Warmuth

Proceedings of The 28th Annual Conference on Learning Theory (COLT 2015)

Ensemble methods for structured prediction

Preview
Vitaly Kuznetsov

Proceedings of the 31st International Conference on Machine Learning (ICML 2014)

Deep boosting

Proceedings of the Thirty-First International Conference on Machine Learning (ICML 2014)

Preview abstract
We present a new ensemble learning algorithm, DeepBoost, which can use as base classifiers a hypothesis set containing deep decision trees, or members of other rich or complex families, and succeed in achieving high accuracy without overfitting the data. The key to the success of the algorithm is a capacity-conscious criterion for the selection of the hypotheses. We give new data- dependent learning bounds for convex ensembles expressed in terms of the Rademacher complexities of the sub-families composing the base classifier set, and the mixture weight assigned to each sub-family. Our algorithm directly benefits from these guarantees since it seeks to minimize the corresponding learning bound. We give a full description of our algorithm, including the details of its derivation, and report the results of several experiments showing that its performance compares favorably to that of AdaBoost and Logistic Regression and their L1-regularized variants.
View details

Learning ensembles of structured prediction rules

Preview
Vitaly Kuznetsov

Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014)

Multi-class classification with maximum margin multiple kernel

Preview
Proceedings of the Thirtieth International Conference on Machine Learning (ICML 2013)

Learning kernels using local rademacher complexity

Marius Kloft

Advances in Neural Information Processing Systems (NIPS 2013), MIT Press.

Preview abstract
We use the notion of local Rademacher complexity to design new algorithms for learning kernels. Our algorithms thereby benefit from the sharper learning bounds based on that notion which, under certain general conditions, guarantee a faster convergence rate. We devise two new learning kernel algorithms: one based on a convex optimization problem for which we give an efficient solution using existing learning kernel techniques, and another one that can be formulated as a DC-programming problem for which we describe a solution in detail. We also re- port the results of experiments with both algorithms in both binary and multi-class classification tasks.
View details

Accuracy at the Top

Stephen Boyd

NIPS: Neural Information Processing Systems Foundation (2012)

Preview abstract
We introduce a new notion of classiﬁcation accuracy based on the top τ -quantile
values of a scoring function, a relevant criterion in a number of problems arising
for search engines. We deﬁne an algorithm optimizing a convex surrogate of the
corresponding loss, and show how its solution can be obtained by solving a set
of convex optimization problems. We also present margin-based guarantees for
this algorithm based on the top τ -quantile of the scores of the functions in the
hypothesis set. Finally, we report the results of several experiments in the bipartite setting evaluating the performance of our algorithm and comparing the results to several other algorithms seeking high precision at the top. In most examples, our algorithm achieves a better performance in precision at the top.
View details

Algorithms for Learning Kernels Based on Centered Alignment

Preview
Journal of Machine Learning Research, vol. 13 (2012), pp. 795-828

Ensembles of Kernel Predictors

Preview
Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011)

Domain adaptation in regression

Preview
Proceedings of The 22nd International Conference on Algorithmic Learning Theory, ALT 2011, Springer, Heidelberg, Germany

A Dual Coordinate Descent Algorithm for SVMs Combined with Rational Kernels

Preview
International Journal of Foundations of Computer Science, vol. 22 (2011), pp. 1761-1779

Algorithms for Learning Kernels Based on Centered Alignment

Preview
Journal of Machine Learning Research, vol. 13 (2010), pp. 795-828

Generalization Bounds for Learning Kernels

Proceedings of the 27th Annual International Conference on Machine Learning (ICML 2010)

Preview abstract
This paper presents several novel generalization
bounds for the problem of learning kernels based
on a combinatorial analysis of the Rademacher
complexity of the corresponding hypothesis sets.
Our bound for learning kernels with a convex
combination of p base kernels using L1 regularization
admits only a √log p dependency on the
number of kernels, which is tight and considerably
more favorable than the previous best bound
given for the same problem. We also give a novel
bound for learning with a non-negative combination
of p base kernels with an L2 regularization
whose dependency on p is also tight and only in
p^(1/4). We present similar results for Lq regularization
with other values of q, and outline the relevance
of our proof techniques to the analysis of
the complexity of the class of linear functions.
Experiments with a large number of kernels further
validate the behavior of the generalization
error as a function of p predicted by our bounds.
View details

Learning Bounds for Importance Weighting

Preview
Yishay Mansour

Advances in Neural Information Processing Systems (NIPS 2010), MIT Press, Vancouver, Canada

Half Transductive Ranking

Bing Bai

Jason Weston

David Grangier

Ronan Collobert

Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010)

Preview abstract
We study the standard retrieval task of ranking
a fixed set of items given a previously unseen
query and pose it as the half transductive
ranking problem. The task is transductive
as the set of items is fixed. Transductive
representations (where the vector representation
of each example is learned) allow
the generation of highly nonlinear embeddings
that capture object relationships without
relying on a specific choice of features,
and require only relatively simple optimization.
Unfortunately, they have no direct outof-
sample extension. Inductive approaches
on the other hand allow for the representation
of unknown queries. We describe algorithms
for this setting which have the advantages
of both transductive and inductive approaches,
and can be applied in unsupervised
(either reconstruction-based or graph-based)
and supervised ranking setups. We show empirically
that our methods give strong performance
on all three tasks.
View details

On the Impact of Kernel Approximation on Learning Accuracy

Ameet Talwalkar

Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010)

Preview abstract
Kernel approximation is commonly used to scale
kernel-based algorithms to applications containing
as many as several million instances. This
paper analyzes the effect of such approximations
in the kernel matrix on the hypothesis generated
by several widely used learning algorithms. We
give stability bounds based on the norm of the
kernel approximation for these algorithms, including
SVMs, KRR, and graph Laplacian-based
regularization algorithms. These bounds help determine
the degree of approximation that can be
tolerated in the estimation of the kernel matrix.
Our analysis is general and applies to arbitrary
approximations of the kernel matrix. However,
we also give a specific analysis of the Nystr¨om
low-rank approximation in this context and report
the results of experiments evaluating the
quality of the Nystr¨om low-rank kernel approximation
when used with ridge regression.
View details

Two-Stage Learning Kernel Algorithms

Proceedings of the 27th Annual International Conference on Machine Learning (ICML 2010)

Preview abstract
This paper examines two-stage techniques for
learning kernels based on a notion of alignment.
It presents a number of novel theoretical, algorithmic,
and empirical results for alignmentbased
techniques. Our results build on previous
work by Cristianini et al. (2001), but we adopt
a different definition of kernel alignment and
significantly extend that work in several directions:
we give a novel and simple concentration
bound for alignment between kernel matrices;
show the existence of good predictors for kernels
with high alignment, both for classification
and for regression; give algorithms for learning a
maximum alignment kernel by showing that the
problem can be reduced to a simple QP; and report
the results of extensive experimentswith this
alignment-based method in classification and regression
tasks, which show an improvement both
over the uniformcombination of kernels and over
other state-of-the-art learning kernel methods.
View details

L2 Regularization for Learning Kernels

Preview
Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009), Montr\'eal, Canada

Learning non-linear combinations of kernels

Preview
NIPS 2009, Advances in Neural Information Processing Systems, MIT Press

Invited talk: Can learning kernels help performance?

Preview
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning, ACM, New York, NY, USA (2009), pp. 1-1

Polynomial semantic indexing

Preview
Bing Bai

Jason Weston

David Grangier

Ronan Collobert

Kunihiko Sadamasa

Yanjun Qi

Advances in Neural Information Processing Systems (NIPS 2009), MIT Press

Kernel Methods for Learning Languages

Preview
Leonid Kontorovich

Theoretical Computer Science, vol. 405 (2008), pp. 223-236

Learning with weighted transducers

Preview
Proceedings of the Seventh International Workshop Finite-State Methods and Natural Language Processing (2008)

Stability of Transductive Regression Algorithms

Preview
Dmitry Pechyony

Ashish Rastogi

Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML 2008), Helsinki, Finland

A Machine Learning Framework for Spoken-Dialog Classification

Preview
Patrick Haffner

Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2008)

On the Computation of the Relative Entropy of Probabilistic Automata

Preview
Ashish Rastogi

International Journal of Foundations of Computer Science, vol. 19 (2008), pp. 219-242

Sample Selection Bias Correction Theory

Preview
Proceedings of The 19th International Conference on Algorithmic Learning Theory (ALT 2008), Springer, Heidelberg, Germany, Budapest, Hungary

Learning sequence kernels

Preview
Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (2008)

Learning Languages with Rational Kernels

Preview
Leonid Kontorovich

Proceedings of The 20th Annual Conference on Computational Learning Theory (COLT 2007), Springer, Heidelberg, Germany, San Diego, California

An Alternative Ranking Problem for Search Engines

Preview
Ashish Rastogi

Proceedings of the 6th Workshop on Experimental Algorithms (WEA 2007), Springer-Verlag, Heidelberg, Germany, Rome, Italy, pp. 1-21

On Transductive Regression

Preview
Advances in Neural Information Processing Systems (NIPS 2006), MIT Press, Vancouver, Canada (2007)

A Machine Learning Framework for Spoken-Dialog Classification

Preview
Patrick Haffner

Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2007)

Lp Distance and Equivalence of Probabilistic Automata

Preview
Ashish Rastogi

International Journal of Foundations of Computer Science, vol. 18 (2007)

Lp Distance and Equivalence of Probabilistic Automata

Preview
Ashish Rastogi

International Journal of Foundations of Computer Science, vol. 18 (2007), pp. 761-780

L_p Distance and Equivalence of Probabilistic Automata

Preview
Ashish Rastogi

International Journal of Foundations of Computer Science, vol. to appear (2007)

Magnitude-Preserving Ranking Algorithms

Preview
Ashish Rastogi

Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML 2007), Oregon State University, Corvallis, OR

On the Computation of the Relative Entropy of Probabilistic Automata

Preview
Ashish Rastogi

International Journal of Foundations of Computer Science, vol. to appear (2007)

On the Computation of Some Standard Distances between Probabilistic Automata

Preview
Ashish Rastogi

Proceedings of the 11th International Conference on Implementation and Application of Automata (CIAA 2006), Springer-Verlag, Heidelberg, Germany, Taipei, Taiwan

Efficient Computation of the Relative Entropy of Probabilistic Automata

Preview
Ashish Rastogi

Proceedings of the 7th Latin American Symposium (LATIN 2006), Springer-Verlag, Heidelberg, Germany, Valdivia, Chile

Learning Linearly Separable Languages

Preview
Leonid Kontorovich

Proceedings of The 17th International Conference on Algorithmic Learning Theory (ALT 2006), Springer, Heidelberg, Germany

A General Regression Technique for Learning Transductions

Preview
Jason Weston

Proceedings of the Twenty-Second International Conference on Machine Learning (ICML 2005), Bonn, Germany

Finite-State Transducers in Computational Biology

Preview
Tutorial presented at the 13th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2005), Detroit, MI

Margin-Based Ranking Meets Boosting in the Middle

Preview
Cynthia Rudin

Robert E. Schapire

Proceedings of The 18th Annual Conference on Computational Learning Theory (COLT 2005), Springer, Heidelberg, Germany, Bertinoro, Italy, pp. 63-78

Confidence Intervals for the Area under the ROC Curve

Preview
Advances in Neural Information Processing Systems (NIPS 2004), MIT Press, Vancouver, Canada (2005)

Margin-Based Ranking Meets Boosting in the Middle

Preview
Cynthia Rudin

Robert E. Schapire

Proc. of the 18th Annual Conference on Computational Learning Theory (COLT 2005), Springer, Heidelberg, Germany, pp. 63-78

Rational Kernels: Theory and Algorithms

Preview
Patrick Haffner

Journal of Machine Learning Research (JMLR), vol. 5 (2004), pp. 1035-1062

Distribution Kernels Based on Moments of Counts

Preview
Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004), Banff, Alberta, Canada

Hancock: A language for analyzing transactional data streams

Preview
Kathleen Fisher

Daryl Pregibon

Anne Rogers

Frederick Smith

ACM Trans. Program. Lang. Syst., vol. 26 (2004), pp. 301-338

Voted Kernel Regularization

Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions

New Generalization Bounds for Learning Kernels

Stability Analysis and Learning Bounds for Transductive Regression Algorithms

An Alternative Ranking Problem for Search Engines

Learning Languages with Rational Kernels

Magnitude-preserving ranking algorithms

Efficient Computation of the Relative Entropy of Probabilistic Automata

On Transductive Regression

On the Computation of Some Standard Distances Between Probabilistic Automata

Learning Linearly Separable Languages

Moment Kernels for Regular Distributions

Rational Kernels: Theory and Algorithms

Patrick Haffner

Journal of Machine Learning Research, vol. 5 (2004), pp. 1035-1062

AUC Optimization vs. Error Rate Minimization

Advances in Neural Information Processing Systems (NIPS 2003), MIT Press, Vancouver, Canada (2004)

Distribution kernels based on moments of counts

Confidence Intervals for the Area Under the ROC Curve

Weighted automata kernels - general framework and algorithms

Lattice Kernels for Spoken-Dialog Classification

Patrick Haffner

Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), Hong Kong

Lattice kernels for spoken-dialog classification

AUC Optimization vs. Error Rate Minimization

Rational Kernels

Patrick Haffner

Advances in Neural Information Processing Systems (NIPS 2002), MIT Press, Vancouver, Canada (2003)

Positive Definite Rational Kernels

Weighted Automata Kernels -- General Framework and Algorithms

Patrick Haffner

Proceedings of the 9th European Conference on Speech Communication and Technology (Eurospeech '03), Special Session Advanced Machine Learning Algorithms for Speech and Language Processing, Geneva, Switzerland (2003)

Positive Definite Rational Kernels

Patrick Haffner

Proceedings of The 16th Annual Conference on Computational Learning Theory (COLT 2003), Springer, Heidelberg, Germany, Washington D.C.

Rational Kernels

Communities of interest

Signature-Based Methods for Data Streams

Communities of Interest

Hancock: a language for extracting signatures from data streams

Context-Free Recognition with Weighted Automata

Squashing Flat Files Flatter

William DuMouchel

Chris Volinsky

Theodore Johnson

Daryl Pregibon

KDD (1999), pp. 6-15

Context-Free Recognition with Weighted Automata

Proceedings of the Sixth Meeting on Mathematics of Language (MOL6), Orlando, Florida (1999)

Information Mining Platforms: An Infrastructure for KDD Rapid Deployment

Giga-Mining

Limits on Learning Machine Accuracy Imposed by Data Quality

Boosting Decision Trees

Support-Vector Networks

Boosting and Other Machine Learning Algorithms

Limits in Learning Machine Accuracy Imposed by Data Quality

Learning Curves: Asymptotic Values and Rate of Convergence

Lawrence D. Jackel

Sara A. Solla

Vladimir Vapnik

John S. Denker

NIPS (1993), pp. 327-334