Translation-Inspired OCR

Dmitriy Genzel
Nemanja Spasojevic
Michael Jahr
Frank Yung-Fong Tang
ICDAR-2011
Google Scholar

Abstract

Optical character recognition is carried out using techniques borrowed from statistical machine translation. In particular, the use of multiple simple feature functions in linear combination, along with minimum-error-rate training, integrated decoding, and $N$-gram language modeling is found to be remarkably effective, across several scripts and languages. Results are presented using both synthetic and real data in five languages.

Research Areas