Document image decoding using iterated complete path heuristic
Abstract
The computation time of Document Image Decoding can be significantly reduced by employing
heuristics in the search for the best decoding of a text line. By using a cheap upper bound
on template match scores, up to 99.9% of the potential template matches can be avoided. In the
Iterated Complete Path method, template matches are performed only along the best path found by
dynamic programming on each iteration. When the best path stabilizes, the decoding is optimal
and no more template matches need be performed. Computation can be further reduced in this
scheme by exploiting the incremental nature of the Viterbi iterations. Because only a few trellis
edge weights have changed since the last iteration, most of the backpointers do not need to be
updated. We describe how to quickly identify these backpointers, without forfeiting optimality of
the path. Together these improvements provide a 30x speedup over previous implementations of
Document Image Decoding.