Miltos Allamanis
My research interests are at the intersection of machine learning, programming languages, and software engineering. Particularly, I am interested in deep learning models that can "understand" and generate the complex and highly-structured nature of source code.
For a list of publications before joining Google see my personal website or my Google Scholar profile.
Research Areas
Authored Publications
Sort By
Do Large Code Models Understand Programming Concepts? A Black Box Approach
Ashish Hooda
Aaron Wilson
Kassem Fawaz
Somesh Jha
(2024) (to appear)
Preview abstract
Large Language Models have been able to replicate their success from text generation to coding tasks. While a lot of work has made it clear that they have remarkable performance on tasks such as code completion and editing, it is still unclear as to why. We help bridge this gap by exploring to what degree do auto-regressive models understand the logical constructs of the underlying programs. We propose CAPP, a counterfactual testing framework to evaluate whether large code models understand programming concepts. With only black-box access to the model, we use CAPP to evaluate 10 popular large code models for 5 different programming concepts. Our findings suggest that current models lack understanding of concepts such as data flow and control flow.
View details