Global Relational Models of Source Code
Abstract
Models of code can learn distributed representations of a program’s syntax and
semantics to predict many non-trivial properties of a program. Recent state-of-the-art models leverage highly structured representations of programs, such as
trees, graphs and paths therein (e.g., data-flow relations), which are precise and
abundantly available for code. This provides a strong inductive bias towards semantically
meaningful relations, yielding more generalizable representations than
classical sequence-based models. Unfortunately, these models primarily rely on
graph-based message passing to represent relations in code, which makes them
de facto local due to the high cost of message-passing steps, quite in contrast to
modern, global sequence-based models, such as the Transformer. In this work,
we bridge this divide between global and structured models by introducing two
new hybrid model families that are both global and incorporate structural bias:
Graph Sandwiches, which wrap traditional (gated) graph message-passing layers
in sequential message-passing layers; and Graph Relational Embedding Attention
Transformers (GREAT for short), which bias traditional Transformers with
relational information from graph edge types. By studying a popular, non-trivial
program repair task, variable-misuse identification, we explore the relative merits
of traditional and hybrid model families for code representation. Starting with a
graph-based model that already improves upon the prior state-of-the-art for this
task by 20%, we show that our proposed hybrid models improve an additional
10–15%, while training both faster and using fewer parameters.
semantics to predict many non-trivial properties of a program. Recent state-of-the-art models leverage highly structured representations of programs, such as
trees, graphs and paths therein (e.g., data-flow relations), which are precise and
abundantly available for code. This provides a strong inductive bias towards semantically
meaningful relations, yielding more generalizable representations than
classical sequence-based models. Unfortunately, these models primarily rely on
graph-based message passing to represent relations in code, which makes them
de facto local due to the high cost of message-passing steps, quite in contrast to
modern, global sequence-based models, such as the Transformer. In this work,
we bridge this divide between global and structured models by introducing two
new hybrid model families that are both global and incorporate structural bias:
Graph Sandwiches, which wrap traditional (gated) graph message-passing layers
in sequential message-passing layers; and Graph Relational Embedding Attention
Transformers (GREAT for short), which bias traditional Transformers with
relational information from graph edge types. By studying a popular, non-trivial
program repair task, variable-misuse identification, we explore the relative merits
of traditional and hybrid model families for code representation. Starting with a
graph-based model that already improves upon the prior state-of-the-art for this
task by 20%, we show that our proposed hybrid models improve an additional
10–15%, while training both faster and using fewer parameters.