Multi-task neural networks, when trained successfully, can learn to leverage related concepts from different tasks by using weight sharing. Sharing parameters between highly unrelated tasks can hurt both of them, so a strong multi-task model should be able to control the amount of weight sharing between pairs of tasks, and flexibly adapt it to their relatedness. In recent works, routing networks have shown strong performance in a variety of settings, including multi-task learning. However, optimization difficulties often prevent routing models from unlocking their full potential. In this work, we propose a novel routing method, specifically designed for multi-task learning, where routing is optimized jointly with the model parameters by standard backpropagation. We show that it can discover related pairs of tasks, and improve accuracy over strong baselines. In particular, on multi-task learning for the Omniglot dataset our method reduces the state-of-the-art error rate by $17\%$.