Test-Time Adaptation for Visual Document Understanding

Sayna Ebrahimi; Sercan Arik; Tomas Pfister

Test-Time Adaptation for Visual Document Understanding

Sayna Ebrahimi

Sercan Arik

Tomas Pfister

Transactions on Machine Learning Research (2023)

Download Google Scholar

Abstract

For visual document understanding (VDU), self-supervised pretraining has been shown to successfully generate transferable representations, yet, effective adaptation of such representations to distribution shifts at test-time remains to be an unexplored area. We propose DocTTA, a novel test-time adaptation method for documents, that does source-free domain adaptation using unlabeled target document data. DocTTA leverages cross-modality self-supervised learning via masked visual language modeling, as well as pseudo labeling to adapt models learned on a source domain to an unlabeled target domain at test time. We introduce new benchmarks using existing public datasets for various VDU tasks, including entity recognition, key-value extraction, and document visual question answering. DocTTA shows significant improvements on these compared to the source model performance, up to 1.89% in (F1 score), 3.43% (F1 score), and 17.68% (ANLS score), respectively.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Test-Time Adaptation for Visual Document Understanding

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs