Unified Line and Paragraph Detection by Graph Convolutional Networks
Abstract
We formulate the task of detecting lines and paragraphs in
a document into a unified two-level clustering problem. Given a set of
text detection boxes that roughly correspond to words, a text line is a
cluster of boxes and a paragraph is a cluster of lines. These clusters form
a two-level tree that represents a major part of the layout of a document.
We use a graph convolutional network to predict the relations between
text detection boxes and then build both levels of clusters from these
predictions. Experimentally, we demonstrate that the unified approach
can be highly efficient while still achieving state-of-the-art quality for
detecting paragraphs in public benchmarks and real-world images.
a document into a unified two-level clustering problem. Given a set of
text detection boxes that roughly correspond to words, a text line is a
cluster of boxes and a paragraph is a cluster of lines. These clusters form
a two-level tree that represents a major part of the layout of a document.
We use a graph convolutional network to predict the relations between
text detection boxes and then build both levels of clusters from these
predictions. Experimentally, we demonstrate that the unified approach
can be highly efficient while still achieving state-of-the-art quality for
detecting paragraphs in public benchmarks and real-world images.