Xinyun Chen
My research interests are deep learning for program synthesis, neural-symbolic reasoning, and adversarial machine learning. The goals of my research are to push forward the reasoning and generalization capabilities of neural networks, and empower more real-world deployment of deep learning techniques for challenging and security-critical applications.
More information can be found in my Google Scholar page and personal website.
Research Areas
Authored Publications
Sort By
LEGO: Latent Execution-Guided Reasoning for Multi-Hop Question Answering on Knowledge Graphs
Hongyu Ren
Michihiro Yasunaga
Haitian Sun
Jure Leskovec
ICML 2021
Preview abstract
Answering complex natural language questions on knowledge graphs (KGQA) is a challenging task. It requires reasoning with the input natural language questions as well as a massive, incomplete heterogeneous KG. Prior methods obtain an abstract structured query graph/tree from the input question and traverse the KG for answers following the query tree. However, they inherently cannot deal with missing links in the KG. Here we present LEGO, a Latent Execution-Guided reasOning framework to handle this challenge in KGQA. LEGO works in an iterative way, which alternates between (1) a Query Synthesizer, which synthesizes a reasoning action and grows the query tree step-by-step, and (2) a Latent Space Executor that executes the reasoning action in the latent embedding space to combat against the missing information in KG. To learn the synthesizer without step-wise supervision, we design a generic latent execution guided bottom-up search procedure to find good execution traces efficiently in the vast query space. Experimental results on several KGQA benchmarks demonstrate the effectiveness of our framework compared with previous state of the art.
View details
SpreadsheetCoder: Formula Prediction from Semi-structured Context
Rishabh Singh
Proceedings of the 38th International Conference on Machine Learning (ICML) (2021)
Preview abstract
Spreadsheet formula prediction has been an important
program synthesis problem with many
real-world applications. Previous works typically
utilize input-output examples as the specification
for spreadsheet formula synthesis, where each
input-output pair simulates a separate row in the
spreadsheet. However, this formulation does not
fully capture the rich context in real-world spreadsheets.
First, spreadsheet data entries are organized
as tables, thus rows and columns are not necessarily
independent from each other. In addition,
many spreadsheet tables include headers, which
provide high-level descriptions of the cell data.
However, previous synthesis approaches do not
consider headers as part of the specification. In
this work, we present the first approach for synthesizing
spreadsheet formulas from tabular context,
which includes both headers and semi-structured
tabular data. In particular, we propose SpreadsheetCoder,
a BERT-based model architecture
to represent the tabular context in both row-based
and column-based formats. We train our model on
a large dataset of spreadsheets, and demonstrate
that SpreadsheetCoder achieves top-1 prediction
accuracy of 42:51%, which is a considerable
improvement over baselines that do not employ
rich tabular context. Compared to a rule-based
system, SpreadsheetCoder assists 82% more
users in composing formulas on Google Sheets.
View details