Model-Based Reinforcement Learning for Biological Sequence Design

Christof Angermueller; David Dohan; David Belanger; Ramya Deshpande; Kevin Murphy; Lucy Colwell

Model-Based Reinforcement Learning for Biological Sequence Design

Christof Angermueller

David Dohan

David Belanger

Ramya Deshpande

Kevin Murphy

Lucy Colwell

ICLR 2020 (2020)

Download Google Scholar

Abstract

Being able to design biological sequences like DNA or proteins to have desired properties would have considerable impact in medical and industrial applications. However, doing so presents a challenging black-box optimization problem that requires multiple rounds of expensive, time-consuming experiments. In response, we propose using reinforcement learning (RL) for biological sequence design. RL is a flexible framework that allows us to optimize generative sequence policies to achieve a variety of criteria, including diversity among high-quality sequences discovered. We use model-based RL to improve sample efficiency, where at each round the policy is trained offline using a simulator fit on functional measurements from prior rounds. To accommodate the growing number of observations across rounds, the simulator model is automatically selected at each round from a pool of diverse models of varying capacity. On the tasks of designing DNA transcription factor binding sites, designing antimicrobial proteins, and optimizing the energy of Ising models based on protein structures, we find that model-based RL is an attractive alternative to existing methods.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Model-Based Reinforcement Learning for Biological Sequence Design

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs