Sampled Connectionist Temporal Classification

Erik McDermott
Kamel Lahouel
ICASSP 2018 (2018)

Abstract

This article introduces and evaluates Sampled Connectionist
Temporal Classification (CTC) which connects the CTC criterion to
the Cross Entropy (CE) objective through sampling. Instead of com-
puting the logarithm of the sum of the alignment path likelihoods,
at each training step the sampled CTC only computes the CE loss be-
tween the sampled alignment path and model posteriors. It is shown
that the sampled CTC objective is an unbiased estimator of an upper
bound for the CTC loss, thus minimization of the sampled CTC is
equivalent to the minimization of the upper bound of the CTC ob-
jective. The definition of the sampled CTC objective has the advan-
tage that it is scalable computationally to the massive datasets using
accelerated computation machines. The sampled CTC is compared
with CTC in two large-scale speech recognition tasks and it is shown
that sampled CTC can achieve similar WER performance of the best
CTC baseline in about one fourth of the training time of the CTC
baseline.