Translation-Inspired OCR

Dmitriy Genzel; Ashok C. Popat; Nemanja Spasojevic; Michael Jahr; Andrew Senior; Eugene Ie; Frank Yung-Fong Tang

Translation-Inspired OCR

Dmitriy Genzel

Ashok C. Popat

Nemanja Spasojevic

Michael Jahr

Andrew Senior

Eugene Ie

Frank Yung-Fong Tang

ICDAR-2011

Google Scholar

Abstract

Optical character recognition is carried out using techniques
borrowed from statistical machine translation. In particular, the
use of multiple simple feature functions in linear combination,
along with minimum-error-rate training, integrated decoding, and
$N$-gram language modeling is found to be remarkably effective,
across several scripts and languages. Results are presented using
both synthetic and real data in five languages.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Translation-Inspired OCR

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs