Gogul Balakrishnan

Gogul Balakrishnan

At Google, I use program analysis to improve the security and privacy of software. Currently, I am working on the Raksha project with a focus on dataflow analysis for privacy. Until recently, I was a part of the Swift for Tensor Flow project. Before that I led a team that focuses on static program analysis of Android Apps to detect vulnerabilities and malware.

Before Google, I was an engineer at Facebook, and a research staff member in the Systems Analysis and Verification (SAV) group at NEC Laboratories America, Inc., Princeton, NJ. I got my Ph.D in Computer Science from the University of Wisconsin-Madison (personal page). I went to the College of Engineering, Guindy for my undergraduate degree. I am from Pollachi, a small and lively town in Tamil Nadu, India. Of late, I have taken a liking to photography.

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Learning and Evaluating Contextual Embedding of Source Code
    Aditya Kanade
    International Conference on Machine Learning (ICML), Vienna, Austria (2020)
    Preview abstract Recent research has achieved impressive results on understanding and improving source code by building up on machine-learning techniques developed for natural languages. A significant advancement in natural-language understanding has come with the development of pre-trained contextual embeddings, such as BERT, which can be fine-tuned for downstream tasks with less labeled data and training budget, while achieving better accuracies. However, there is no attempt yet to obtain a high-quality contextual embedding of source code, and to evaluate it on multiple program-understanding tasks simultaneously; that is the gap that this paper aims to mitigate. Specifically, first, we curate a massive, deduplicated corpus of 6M Python files from GitHub, which we use to pre-train CuBERT, an open-sourced code-understanding BERT model; and, second, we create an open-sourced benchmark that comprises five classification tasks and one program-repair task, akin to code-understanding tasks proposed in the literature before. We fine-tune CuBERT on our benchmark tasks, and compare the resulting models to different variants of Word2Vec token embeddings, BiLSTM and Transformer models, as well as published state-of-the-art models, showing that CuBERT outperforms them all, even with shorter training, and with fewer labeled examples. Future work on source-code embedding can benefit from reusing our benchmark, and comparing against CuBERT models as a strong baseline. View details
    ARC++: Effective Typestate and Lifetime Dependency Analysis
    Xusheng Xiao
    Naoto Maeda
    Aarti Gupta
    Deepak Chhetri
    ISSTA, ACM (2014), pp. 116-126
    Preview