Robots trained via reinforcement-learning (RL) requirecollecting and labeling many real-world episodes, whichmay be costly and time-consuming. Training models with alarge amount of simulation is a cheaper alternative. How-ever, simulations are not perfect and such models may nottransfer to the real world. Techniques developed to closethis simulation-to-reality (Sim2Real) gap typically applyrandomization to the simulated images or adapt them withan additional Sim2Real model. A Generative Adversar-ial network (GAN) may be used to adapt the pixels of thesimulated image to be more realistic before use by a deepRL model. We find the CycleGAN which enforces a cycleconsistency between Sim2Real and Real2Sim adaptationsproduces better images for RL than a GAN alone. Ulti-mately, we develop RL-CycleGAN which includes a Cycle-GAN which trains jointly with the deep RL model and en-forces that the RL model is consistent across all the adap-tations.We evaluate the RL-CycleGAN on two vision-based robotics grasping tasks and compare it to previoustechniques. With 580,000 real episodes and millions ofsimulated episodes adapted with RL-CycleGAN achievesxx% grasp success, while a previous GAN-based approach,GraspGAN, achieves xx% grasp success. With only 5,000real episodes, RL-CycleGAN and GraspGAN achieve xx%and xx% grasp success respectively. On a multi-bin grasp-ing task, we show RL-CycleGAN drastically improves dataefficiency requiring 1/xth the amount of real data to reachthe same grasping performance.