Sandeep Tata
Sandeep Tata is a Staff Software Engineer in the Strategic Technologies group in Google Research. He currently leads a team focused on information extraction using machine-learning. Prior to Google, Sandeep was a researcher at IBM's Almaden Research Center in the Data Management group. His interests lie broadly at the intersection of large-scale data management and applied machine-learning. He earned his PhD in Computer Science from the University of Michigan (2007).
Authored Publications
Sort By
Google
FieldSwap: Data Augmentation for Effective Form-Like Document Extraction
Seth Ebner
IEEE 40th International Conference on Data Engineering (ICDE) (2024), pp. 4722-4732
Selective Labeling: How to Radically Lower Data-Labeling Costs for Document Extraction Models
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, ACL, pp. 3847-3860
VRDU: A Benchmark for Visually-rich Document Understanding
Zilong Wang
Wei Wei
2023 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Glean: Structured Extractions from Templatic Documents
Proceedings of the VLDB Endowment (2021), pp. 997-1005
Data-Efficient Information Extraction from Form-Like Documents
Document Intelligence Workshop @ KDD 2021
Representation Learning for Information Extraction from Form-like Documents
Bodhisattwa Majumder
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), pp. 6495-6504
Migrating a Privacy-Safe Information Extraction System to a Software 2.0 Design
Nguyen Ha Vo
Proceedings of the 10th Annual Conference on Innovative Data Systems Research (2020)