Sai Teja Peddinti
Sai Teja Peddinti is a Research Scientist in the Infrastructure Security and Privacy group at Google. He received his PhD in Computer Science from New York University in 2015. The focus of his PhD work was in large scale data-driven analysis to understand user privacy preferences and concerns, and to evaluate effectiveness of privacy solutions. His research interests are in privacy, machine learning, network and cloud security, and cryptography.
Authored Publications
Sort By
Preview abstract
With the increase in the number of privacy regulations, small development teams are forced to make privacy decisions on their own. In this paper, we conduct a mixed-method survey study, including statistical and qualitative analysis, to evaluate the privacy perceptions, practices, and knowledge of members involved in various phases of the Software Development Life Cycle (SDLC). Our survey includes 362 participants from 23 countries, encompassing roles such as product managers, developers, and testers. Our results show diverse definitions of privacy across SDLC roles, emphasizing the need for a holistic privacy approach throughout SDLC. We find that software teams, regardless of their region, are less familiar with privacy concepts (such as anonymization), relying on self-teaching and forums. Most participants are more familiar with GDPR and HIPAA than other regulations, with multi-jurisdictional compliance being their primary concern. Our results advocate the need for role-dependent solutions to address the privacy challenges, and we highlight research directions and educational takeaways to help improve privacy-aware SDLC.
View details
Preview abstract
In this paper we study users' opinions about the privacy of their mobile health apps. We look at what they write in app reviews in the 'Health & Fitness' category on the Google Play store. We identified 2832 apps in this category (based on 1K minimum installs). Using NLP/LLM analyses, we find that 76% of these apps have at least some privacy reviews. In total this yields over 164,000 reviews about privacy, from over 150 countries and in 25 languages. Our analyses identifies top themes and offers an approximation of how widespread these issues are around the world. We show that the top 4 themes - Data Sharing and Exposure, Permission Requests, Location Tracking and Data Collection - are issues of concern in over 70 countries. Our automatically generated thematic summaries reveal interesting aspects that deserve further research around user suspicions (unneeded data collection), user requests (more fine-grained control over data collection and data access), as well as user behavior (uninstalling apps).
View details
A Decade of Privacy-Relevant Android App Reviews: Large Scale Trends
Omer Akgul
Michelle Mazurek
Benoit Seguin
Preview abstract
We present an analysis of 12 million instances of privacy-relevant reviews publicly visible on the Google Play Store that span a 10 year period. By leveraging state of the art NLP techniques, we examine what users have been writing about privacy along multiple dimensions: time, countries, app types, diverse privacy topics, and even across a spectrum of emotions. We find consistent growth of privacy-relevant reviews, and explore topics that are trending (such as Data Deletion and Data Theft), as well as those on the decline (such as privacy-relevant reviews on sensitive permissions). We find that although privacy reviews come from more than 200 countries, 33 countries provide 90% of privacy reviews. We conduct a comparison across countries by examining the distribution of privacy topics a country’s users write about, and find that geographic proximity is not a reliable indicator that nearby countries have similar privacy perspectives. We uncover some countries with unique patterns and explore those herein. Surprisingly, we uncover that it is not uncommon for reviews that discuss privacy to be positive (32%); many users express pleasure about privacy features within apps or privacy-focused apps. We also uncover some unexpected behaviors, such as the use of reviews to deliver privacy disclaimers to developers. Finally, we demonstrate the value of analyzing app reviews with our approach as a complement to existing methods for understanding users' perspectives about privacy.
View details
Unveiling Privacy Perspectives about Mobile Health Apps on a Large Scale
PETS workshop: Privacy, Safety and Trust for Mobile Health Apps (2024)
Preview abstract
In this paper we study users' opinions about the privacy of their mobile health apps. We look at what they write in app reviews in the 'Health & Fitness' category on the Google Play store. We identified 2832 apps in this category (based on 1K minimum installs). Using NLP/LLM analyses, we find that 76% of these apps have at least some privacy reviews. In total this yields over 164,000 reviews about privacy, from over 150 countries and in 25 languages. Our analyses identifies top themes and offers an approximation of how widespread these issues are around the world. We show that the top 4 themes - Data Sharing and Exposure, Permission Requests, Location Tracking and Data Collection - are issues of concern in over 70 countries. Our automatically generated thematic summaries reveal interesting aspects that deserve further research around user suspicions (unneeded data collection), user requests (more fine-grained control over data collection and data access), as well as user behavior (uninstalling apps).
View details
Preview abstract
In this paper we study users' opinions about the privacy of their mobile health apps. We look at what they write in app reviews in the 'Health & Fitness' category on the Google Play store. We identified 2832 apps in this category (based on 1K minimum installs). Using NLP/LLM analyses, we find that 76% of these apps have at least some privacy reviews. In total this yields over 164,000 reviews about privacy, from over 150 countries and in 25 languages. Our analyses identifies top themes and offers an approximation of how widespread these issues are around the world. We show that the top 4 themes - Data Sharing and Exposure, Permission Requests, Location Tracking and Data Collection - are issues of concern in over 70 countries. Our automatically generated thematic summaries reveal interesting aspects that deserve further research around user suspicions (unneeded data collection), user requests (more fine-grained control over data collection and data access), as well as user behavior (uninstalling apps).
View details
Towards Fine-Grained Localization of Privacy Behaviors
Vijayanta Jain
Sepideh Ghanavati
Collin McMillan
2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P), pp. 258-277
Preview abstract
Privacy labels help developers communicate their application's privacy behaviors (i.e., how and why an application uses personal information) to users. But, studies show that developers face several challenges in creating them and the resultant labels are often inconsistent with their application's privacy behaviors. In this paper, we create a novel methodology called fine-grained localization of privacy behaviors to locate individual statements in source code which encode privacy behaviors and predict their privacy labels. We design and develop an attention-based multi-head encoder model which creates individual representations of multiple methods and uses attention to identify relevant statements that implement privacy behaviors. These statements are then used to predict privacy labels for the application's source code and can help developers write privacy statements that can be used as notices. Our quantitative analysis shows that our approach can achieve high accuracy in identifying privacy labels, with the lowest accuracy of 91.41% and the highest of 98.45%. We also evaluate the efficacy of our approach with six software professionals from our university. The results demonstrate that our approach reduces the time and mental effort required by developers to create high-quality privacy statements and can finely localize statements in methods that implement privacy behaviors.
View details
Hark: A Deep Learning System for Navigating Privacy Feedback at Scale
Rishabh Khandelwal
2022 IEEE Symposium on Security and Privacy (SP)
Preview abstract
Integrating user feedback is one of the pillars for building successful products. However, this feedback is generally collected in an unstructured free-text form, which is challenging to understand at scale. This is particularly demanding in the privacy domain due to the nuances associated with the concept and the limited existing solutions. In this work, we present Hark, a system for discovering and summarizing privacy-related feedback at scale. Hark automates the entire process of summarizing privacy feedback, starting from unstructured text and resulting in a hierarchy of high-level privacy themes and fine-grained issues within each theme, along with representative reviews for each issue. At the core of Hark is a set of new deep learning models trained on different tasks, such as privacy feedback classification, privacy issues generation, and high-level theme creation. We illustrate Hark’s efficacy on a corpus of 626M Google Play reviews. Out of this corpus, our privacy feedback classifier extracts 6M privacy-related reviews (with an AUC-ROC of 0.92). With three annotation studies, we show that Hark’s generated issues are of high accuracy and coverage and that the theme titles are of high quality. We illustrate Hark’s capabilities by presenting high-level insights from 1.3M Android apps.
View details
Analyzing User Perspectives on Mobile App Privacy at Scale
International Conference on Software Engineering (ICSE) (2022)
Preview abstract
In this paper we present a methodology to analyze users’ concerns and perspectives about privacy at scale. We leverage NLP
techniques to process millions of mobile app reviews and extract
privacy concerns. Our methodology is composed of a binary classifier that distinguishes between privacy and non-privacy related
reviews. We use clustering to gather reviews that discuss similar
privacy concerns, and employ summarization metrics to extract
representative reviews to summarize each cluster. We apply our
methods on 287M reviews for about 2M apps across the 29 categories in Google Play to identify top privacy pain points in mobile
apps. We identified approximately 440K privacy related reviews.
We find that privacy related reviews occur in all 29 categories, with
some issues arising across numerous app categories and other issues
only surfacing in a small set of app categories. We show empirical
evidence that confirms dominant privacy themes – concerns about
apps requesting unnecessary permissions, collection of personal
information, frustration with privacy controls, tracking and the selling of personal data. As far as we know, this is the first large scale
analysis to confirm these findings based on hundreds of thousands
of user inputs. We also observe some unexpected findings such
as users warning each other not to install an app due to privacy
issues, users uninstalling apps due to privacy reasons, as well as
positive reviews that reward developers for privacy friendly apps.
Finally we discuss the implications of our method and findings for
developers and app stores.
View details
PAcT: Detecting and Classifying Privacy Behavior of Android Applications
Vijayanta Jain
Sanonda Datta Gupta
Sepideh Ghanavati
Collin McMillan
Proceedings of the 15th ACM Conference on Security and Privacy in Wireless and Mobile Networks, Association for Computing Machinery, New York, NY, USA (2022), 104–118
Preview abstract
Interpreting and describing mobile applications' privacy behaviors to ensure creating consistent and accurate privacy notices is a challenging task for developers. Traditional approaches to creating privacy notices are based on predefined templates or questionnaires and do not rely on any traceable behaviors in code which may result in inconsistent and inaccurate notices. In this paper, we present an automated approach to detect privacy behaviors in code of Android applications. We develop Privacy Action Taxonomy (PAcT), which includes labels for Practice (i.e. how applications use personal information) and Purpose (i.e. why). We annotate ~5,200 code segments based on the labels and create a multi-label multi-class dataset with ~14,000 labels. We develop and train deep learning models to classify code segments. We achieve the highest F-1 scores across all label types of 79.62% and 79.02% for Practice and Purpose.
View details
PriGen: Towards Automated Translation of Android Applications' Code to Privacy Captions
Vijayanta Jain
Sanonda Datta Gupta
Sepideh Ghanavati
Research Challenges in Information Science, Springer International Publishing (2021), pp. 142-151
Preview abstract
Mobile applications are required to give privacy notices to the users when they collect or share personal information. Creating consistent and concise privacy notices can be a challenging task for developers. Previous work has attempted to help developers create privacy notices through a questionnaire or predefined templates. In this paper, we propose a novel approach and a framework, called PriGen, that extends these prior work. PriGen uses static analysis to identify Android applications’ code segments which process personal information (i.e. permission-requiring code segments) and then leverages a Neural Machine Translation model to translate them into privacy captions. We present the initial analysis of our translation task for ~300,000 code segments.
View details