Being able to design biological sequences like DNA or proteins to have desired properties would have considerable impact in medical and industrial applications. However, doing so presents a challenging black-box optimization problem that requires multiple rounds of expensive, time-consuming experiments. In response, we propose using reinforcement learning (RL) for biological sequence design. RL is a flexible framework that allows us to optimize generative sequence policies to achieve a variety of criteria, including diversity among high-quality sequences discovered. We use model-based RL to improve sample efficiency, where at each round the policy is trained offline using a simulator fit on functional measurements from prior rounds. To accommodate the growing number of observations across rounds, the simulator model is automatically selected at each round from a pool of diverse models of varying capacity. On the tasks of designing DNA transcription factor binding sites, designing antimicrobial proteins, and optimizing the energy of Ising models based on protein structures, we find that model-based RL is an attractive alternative to existing methods.