Translation-Inspired OCR
Abstract
Optical character recognition is carried out using techniques
borrowed from statistical machine translation. In particular, the
use of multiple simple feature functions in linear combination,
along with minimum-error-rate training, integrated decoding, and
$N$-gram language modeling is found to be remarkably effective,
across several scripts and languages. Results are presented using
both synthetic and real data in five languages.
borrowed from statistical machine translation. In particular, the
use of multiple simple feature functions in linear combination,
along with minimum-error-rate training, integrated decoding, and
$N$-gram language modeling is found to be remarkably effective,
across several scripts and languages. Results are presented using
both synthetic and real data in five languages.