Table Detection in Heterogeneous Documents

Faisal Shafait; Ray Smith

Table Detection in Heterogeneous Documents

Faisal Shafait

Ray Smith

Document Analysis Systems 2010, ACM International Conference Proceedings series

Download Google Scholar

Abstract

Detecting tables in document images is important since not
only do tables contain important information, but also most
of the layout analysis methods fail in the presence of tables
in the document image. Existing approaches for table de-
tection mainly focus on detecting tables in single columns
of text and do not work reliably on documents with varying
layouts. This paper presents a practical algorithm for table
detection that works with a high accuracy on documents
with varying layouts (company reports, newspaper articles,
magazine pages, . . . ). An open source implementation of the
algorithm is provided as part of the Tesseract OCR engine.
Evaluation of the algorithm on document images from pub-
licly available UNLV dataset shows competitive performance
in comparison to the table detection module of a commercial
OCR system.

Research Areas

Machine perception

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Table Detection in Heterogeneous Documents

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs