Emily Johnston

Emily Johnston

I work on static analysis tools for Java. Before coming to Google, I studied Computer Science and researched Evolutionary Computation at Carleton College. My main research interests are in compilers, programming languages, and static analysis, with an emphasis on usability.

Research Areas

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    DeepDelta: Learning to Repair Compilation Errors
    Ali Mesbah
    Andrew Rice
    Nick Glorioso
    Eddie Aftandilian
    Proceedings of the 2019 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) (2019)
    Preview abstract Programmers spend a substantial amount of time manually repairing code that does not compile. We observe that the repairs for any particular error class typically follow a pattern and are highly mechanical. We propose a novel approach that automatically learns these patterns with a deep neural network and suggests program repairs for the most costly classes of build-time compilation failures. We describe how we collect all build errors and the human-authored, in-progress code changes that cause those failing builds to transition to successful builds at Google. We generate an AST diff from the textual code changes and transform it into a domain-specific language called Delta that encodes the change that must be made to make the code compile. We then feed the compiler diagnostic information (as source) and the Delta changes that resolved the diagnostic (as target) into a Neural Machine Translation network for training. For the two most prevalent and costly classes of Java compilation errors, namely missing symbols and mismatched method signatures, our system called DeepDelta, generates the correct repair changes for 19,314 out of 38,788 (50%) of unseen compilation errors. The correct changes are in the top three suggested fixes 86% of the time on average. View details
    Analyzing and Repairing Compilation Errors
    Ali Mesbah
    Andrew Rice
    Eddie Aftandilian
    Nick Glorioso
    International Conference on Software Engineering (ICSE), poster track (2019) (to appear)
    Preview abstract Resolving a build failure consumes developer time both in finding a suitable resolution and in rerunning the build. Our goal is to develop automated repair tools that can automatically resolve build errors and therefore improve developer productivity. We collected data on the resolution of Java build failures to discover how long developers spend resolving different kinds of diagnostics at Google. We found that the diagnostic reporting an unresolved symbol consumes 47% of the total time spent resolving broken builds. We found that choice of tool has a significant impact: 26% of command line builds fail whereas only 3% of IDE builds fail. However, the set of most costly diagnostic kinds remains the same for both. We trained a Neural Machine Translation model on the Abstract Synxtax Tree changes made when resolving an unresolved symbol failure. This generates a correct fix with a true positive rate of 50%. View details
    Detecting argument selection defects
    Andrew Rice
    Eddie Aftandilian
    Michael Pradel
    Yulissa Arroyo-Paredes
    SPLASH 2017 OOPSLA
    Preview abstract Identifier names are often used by developers to convey additional information about the meaning of a program over and above the semantics of the programming language itself. We present an algorithm that uses this information to detect argument selection defects, in which the programmer has chosen the wrong argument to a method call in Java programs. We evaluate our algorithm at Google on 200 million lines of internal code and 10 million lines of predominantly open-source external code and find defects even in large, mature projects such as OpenJDK, ASM, and the MySQL JDBC. The precision and recall of the algorithm vary depending on a sensitivity threshold. Higher thresholds increase precision, giving a true positive rate of 85%, reporting 459 true positives and 78 false positives. Lower thresholds increase recall but lower the true positive rate, reporting 2,060 true positives and 1,207 false positives. We show that this is an order of magnitude improvement on previous approaches. By analyzing the defects found, we are able to quantify best practice advice for API design and show that the probability of an argument selection defect increases markedly when methods have more than five arguments. View details