We investigate methods to efficiently learn diverse strategies in reinforcement learning for a generative structured prediction problem: query reformulation. In the proposed framework an agent consists of multiple specialized sub-agents and a meta-agent that learns to aggregate the answers from sub-agents to produce a final answer. Sub-agents are trained on disjoint partitions of the training data, while the meta-agent is trained on the full training set. Our method makes learning faster, because it is highly parallelizable, and has better generalization performance than strong baselines, such as an ensemble of agents trained on the full data. We evaluate on the tasks of document retrieval and question answering. The improved performance seems due to the increased diversity of reformulation strategies. This suggests that multi-agent, hierarchical approaches might play an important role in structured prediction tasks of this kind. However, we also find that it is not obvious how to characterize diversity in this context, and a first attempt based on clustering did not produce good results. Furthermore, reinforcement learning for the reformulation task is hard in high-performance regimes. At best, it only marginally improves over the state of the art, which highlights the complexity of training models in this framework for end-to-end language understanding problems.