
Sandeep Tata
Sandeep Tata is a Staff Software Engineer in the Strategic Technologies group in Google Research. He currently leads a team focused on information extraction using machine-learning. Prior to Google, Sandeep was a researcher at IBM's Almaden Research Center in the Data Management group. His interests lie broadly at the intersection of large-scale data management and applied machine-learning. He earned his PhD in Computer Science from the University of Michigan (2007).
Authored Publications
Sort By
Google
FieldSwap: Data Augmentation for Effective Form-Like Document Extraction
Seth Ebner
IEEE 40th International Conference on Data Engineering (ICDE) (2024), pp. 4722-4732
VRDU: A Benchmark for Visually-rich Document Understanding
Zilong Wang
Wei Wei
2023 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Selective Labeling: How to Radically Lower Data-Labeling Costs for Document Extraction Models
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, ACL, pp. 3847-3860
Data-Efficient Information Extraction from Form-Like Documents
Document Intelligence Workshop @ KDD 2021
Glean: Structured Extractions from Templatic Documents
Proceedings of the VLDB Endowment (2021), pp. 997-1005
Migrating a Privacy-Safe Information Extraction System to a Software 2.0 Design
Nguyen Ha Vo
Proceedings of the 10th Annual Conference on Innovative Data Systems Research (2020)
Improving Recommendation Quality at Google Drive
Suming Jeremiah Chen
Zachary Teal Wilson
Brian Lee Calaci
Ryan Lee Evans
Sean Robert Abraham
Mike Colagrosso
26TH ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (2020)