- Tianlu Wang
- Xuezhi Wang
- Yao Qin
- Ben Packer
- Kang Lee
- Jilin Chen
- Alex Beutel
- Ed H. Chi
Abstract
NLP models are shown to suffer from robustness issues, for example, a model's prediction can be easily changed under small perturbations to the input. In this work, we aim to present a Controlled Adversarial Text Generation (CAT-Gen) model that, given an input text, it can generate adversarial texts through controllable attributes that are known to be invariant to task labels. For example, for a main task like sentiment classification, an example attribute can be different categories/domains, and a model should have similar performance across them; for a coreference resolution task, a model's performance should not differ across different demographic attributes. Different from many existing adversarial text generation approaches, we show that our model can generate adversarial texts that are more fluent, diverse, and with better task-label invariance guarantees. We aim to use this model to generate counterfactual texts that could better improve robustness in NLP models (e.g., through adversarial training), and we argue that our generation can create more natural attacks.
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work