Google Research

Discrete Point Based Signatures and Applications to Document Matching

  • Nemanja Spasojevic
  • Guillaume Poncin
  • Dan Bloomberg
ICIAP 2011

Abstract

Document analysis often starts with robust signatures, for instance for document lookup from low-quality photographs, or similarity analysis between scanned books. Signatures based on OCR typically work well, but require good quality OCR, which is not always available and can be very costly. In this paper we describe a novel scheme for extracting discrete signatures from document images. It operates on points that describe the position of words, typically the centroid. Each point is extracted using one of several techniques and assigned a signature based on its relation to the nearest neighbors. We will discuss the bene fits of this approach, and demonstrate its application to multiple problems including fast image similarity calculation and document lookup.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work