
David Deutscher
Authored Publications
Sort By
Our Approach to Protecting AI Training Data
Cindy Muya
Jason Novak
Cindee Madison
Reiner Critides
Ben Kamber
Niha Vempati
Jeremy Wiesner
Google, Google, Google, 1600 Amphitheatre Parkway, Mountain View, CA, 94043 (2025) (2025)
Preview abstract
Google has over 25 years experience protecting data from inappropriate access and unauthorized use. In the era of AI, Google has extended these best practices in data protection to ensure that the right data is used the right way to train models. This paper presents a number of these best practices, describes how Google applies them in its systems, and describes how Google Cloud customers can use Google Cloud capabilities to implement these practices themselves.
Protecting data requires both technical controls to enable safe data use at scale, and governance processes to ensure that companies have visibility and control over how their data is used. This fundamentally requires: understanding data and ensuring it has sufficient metadata in the form of attributes, controlling the data and implementing policies to allow (or disallow) certain usage based on those attributes, transforming data to enable its usage in policy compliant ways, and human oversight and governance.
Protecting data in AI inherits these requirements and introduces new requirements to account for unique AI-specific risks including memorization/recitation and the costs of training foundational models. Meeting these new risks requires new capabilities including enhanced understanding of data and model lineage as well as an increased ability to control data usage through checks on data for policy compliance at the time a training job is configured before it is run.
This white paper offers an in-depth look at data protection best practices and Google’s data protection capabilities, and is one of a series of publications about Google's Secure AI Framework (SAIF). Building upon its secure development practices, Google has developed and deployed a number of capabilities to understand, control, and transform data in its infrastructure so that data is both protected and used appropriately. This involves robust annotation systems to represent metadata and enable granular understanding of data at both an item and dataset level, policy engines that evaluate machine readable policies on that data using the metadata attributes, and sensors to understand how data is flowing across Google’s systems and raise alerts when policy violations occur. Moreover, Google has developed de-identification and anonymization systems to transform data to make it policy compliant and safer to use for AI training.
View details
Suggesting (More) Friends Using the Implicit Social Graph
Maayan Roth
Tzvika Barenholz
Assaf Ben-David
Guy Flysher
Ilan Horn
Ari Leichtberg
Ron Merom
International Conference on Machine Learning (ICML) (2011)
Preview abstract
Although users of online communication tools rarely categorize their contacts into groups such as "family", "co-workers", or "jogging buddies", they nonetheless implicitly cluster contacts, by virtue of their interactions with them, forming implicit groups. In this paper, we describe the implicit social graph which is formed by users' interactions with contacts and groups of contacts, and which is distinct from explicit social graphs in which users explicitly add other individuals as their "friends". We introduce an interaction-based metric for estimating a user's affinity to his contacts and groups. We then describe a novel friend suggestion algorithm that uses a user's implicit social graph to generate a friend group, given a small seed set of contacts which the user has already labeled as friends. We show experimental results that demonstrate the importance of both implicit group relationships and interaction-based affinity ranking in suggesting friends. Finally, we discuss two applications of the Friend Suggest algorithm that have been released as Gmail features.
View details
Suggesting Friends Using the Implicit Social Graph
Maayan Roth
Assaf Ben-David
Guy Flysher
Ilan Horn
Ari Leichtberg
Ron Merom
Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2010)
Preview abstract
Although users of online communication tools rarely categorize their contacts into groups such as "family", "co-workers", or "jogging buddies", they nonetheless implicitly cluster contacts, by virtue of their interactions with them, forming implicit groups. In this paper, we describe the implicit social graph which is formed by users' interactions with contacts and groups of contacts, and which is distinct from explicit social graphs in which users explicitly add other individuals as their "friends". We introduce an interaction-based metric for estimating a user's affinity to his contacts and groups. We then describe a novel friend suggestion algorithm that uses a user's implicit social graph to generate a friend group, given a small seed set of contacts which the user has already labeled as friends. We show experimental results that demonstrate the importance of both implicit group relationships and interaction-based affinity ranking in suggesting friends. Finally, we discuss two applications of the Friend Suggest algorithm that have been released as Gmail Labs features.
View details