Michalis Raptis
Research Areas
Authored Publications
Sort By
Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis
Winter Conference on Applications of Computer Vision 2024 (2024) (to appear)
Preview abstract
We propose Hierarchical Text Spotter (HTS), the first method for the joint task of word-level text spotting and geometric layout analysis.
HTS can annotate text in images with a hierarchical representation of 4 levels: character, word, line, and paragraph.
The proposed HTS is characterized by two novel components:
(1) a Unified-Detector-Polygon (UDP) that produces Bezier Curve polygons of text lines and an affinity matrix for paragraph grouping between detected lines;
(2) a Line-to-Character-to-Word (L2C2W) recognizer that splits lines into characters and further merges them back into words.
HTS achieves state-of-the-art results on multiple word-level text spotting benchmark datasets as well as geometric layout analysis tasks.
Code will be released upon acceptance.
View details
ICDAR 2023 Competition on Hierarchical Text Detection and Recognition
Dmitry Panteleev
ICDAR 2023: International Conference on Document Analysis and Recognition (2023)
Preview abstract
We organize a competition on hierarchical text detection and recognition. The competition is aimed to promote research into deep learning models and systems that can simultaneously perform text detection and recognition and geometric layout analysis. We present details of the proposed competition organization, including tasks, datasets, evaluations, and schedule. During the competition period (from January 2nd 2023 to April 1st 2023), at least 50 submissions from more than 30 teams were made in the 2 proposed tasks. Considering the number of teams and submissions, we conclude that the HierText competition has been successfully held. In this report, we will also present the competition results and insights from them.
View details
Unified Line and Paragraph Detection by Graph Convolutional Networks
International Workshop on Document Analysis System (DAS) (2022)
Preview abstract
We formulate the task of detecting lines and paragraphs in
a document into a unified two-level clustering problem. Given a set of
text detection boxes that roughly correspond to words, a text line is a
cluster of boxes and a paragraph is a cluster of lines. These clusters form
a two-level tree that represents a major part of the layout of a document.
We use a graph convolutional network to predict the relations between
text detection boxes and then build both levels of clusters from these
predictions. Experimentally, we demonstrate that the unified approach
can be highly efficient while still achieving state-of-the-art quality for
detecting paragraphs in public benchmarks and real-world images.
View details
Towards End-to-End Unified Scene Text Detection and Layout Analysis
Dmitry Panteleev
CVPR 2022 (2022)
Preview abstract
Scene text detection and document layout analysis have long been treated as two separate tasks in different image domains. In this paper, we bring them together and introduce the task of unified scene text detection and layout analysis. The first hierarchical scene text dataset is introduced to enable this novel research task. We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way. Comprehensive experiments show that our unified model achieves better performance than multiple well-designed baseline methods. Additionally, this model achieves stateof-the-art results on multiple scene text detection datasets without the need of complex post-processing. Dataset and code: https://github.com/google-researchdatasets/hiertext.
View details
Preview abstract
We propose an end-to-end trainable network that can simultaneously detect and recognize text of arbitrary curved path, making substantial progress on the open problem of reading scene text of irregular shape. We formulate arbitrary shape text detection as an instance segmentation problem; an attention model is then used to decode the textual content of each irregularly shaped text region without rectification. To extract useful irregularly shaped text instance features from image scale features, we propose a simple yet effective RoI masking step. Finally, we show that predictions from an existing multi-step OCR engine can be leveraged as partially labeled training data, which leads to significant improvements in both the detection and recognition accuracy of our model. Our method surpasses the state-of-the-art for end-to-end recognition tasks on the ICDAR15 (straight) benchmark by 4.6%, and on the Total-Text (curved) benchmark by more than 16%.
View details