Migrating a Privacy-Safe Information Extraction System to a Software 2.0 Design

Ying Sheng; Nguyen Ha Vo; James B. Wendt; Sandeep Tata; Marc Najork

Migrating a Privacy-Safe Information Extraction System to a Software 2.0 Design

Ying Sheng

Nguyen Ha Vo

James B. Wendt

Sandeep Tata

Marc Najork

Proceedings of the 10th Annual Conference on Innovative Data Systems Research (2020)

Download Google Scholar

Abstract

This paper presents a case study of migrating a privacy-safe information extraction system for Gmail from a traditional rule-based architecture to a machine-learned Software 2.0 architecture. The key idea is to use the extractions from the existing rule-based system as training data to learn ML models
that in turn replace all the machinery for the rule-based system. The resulting system a) delivers better precision and recall, b) is significantly smaller in terms of lines of code, c) has been easier to maintain and improve, and d) has opened up the possibility of leveraging ML advances to build a cross-language extraction system even though our original training data was only in English. We describe challenges encountered during this migration around generation and management of training data, evaluation of models, and report on many traditional ``Software 1.0'' components we built to address them.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Migrating a Privacy-Safe Information Extraction System to a Software 2.0 Design

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs