Sampled Connectionist Temporal Classification

Erik McDermott
Kamel Lahouel
ICASSP 2018(2018)


This article introduces and evaluates Sampled Connectionist Temporal Classification (CTC) which connects the CTC criterion to the Cross Entropy (CE) objective through sampling. Instead of com- puting the logarithm of the sum of the alignment path likelihoods, at each training step the sampled CTC only computes the CE loss be- tween the sampled alignment path and model posteriors. It is shown that the sampled CTC objective is an unbiased estimator of an upper bound for the CTC loss, thus minimization of the sampled CTC is equivalent to the minimization of the upper bound of the CTC ob- jective. The definition of the sampled CTC objective has the advan- tage that it is scalable computationally to the massive datasets using accelerated computation machines. The sampled CTC is compared with CTC in two large-scale speech recognition tasks and it is shown that sampled CTC can achieve similar WER performance of the best CTC baseline in about one fourth of the training time of the CTC baseline.