In this paper, we explore deep reinforcement learning algorithms for vision-based robotic grasping. Model-free deep reinforcement learning has been successfully applied to a range of challenging environments, but the proliferation of various deep RL methods makes it difficult to discern which particular approach would be best suited for a rich, diverse task like grasping. To answer this question, we conduct a detailed simulated study of reinforcement learning methods on a grasping task that emphasizes diversity and off-policy learning. Off-policy learning is important to enable the method to utilize past data of grasping a wide variety of objects, and diversity is important to enable the method to generalize to new objects that were not seen during training. We evaluate a variety of Q-function estimation methods, a method previously proposed for robotic grasping with deep neural network models, and a novel approach based on a combination of Monte Carlo return estimation and an off-policy correction. Our results indicate that several simple methods provide a surprisingly strong competitor to popular algorithms such as double Q-learning, and our analysis of stability sheds light on the relative tradeoffs between the algorithms.