The deployment of deep networks on mobile devices requires to efficiently use the scarce computational resources, expressed as either available memory or computing cost. When addressing multiple tasks simultaneously, it is extremely important to share resources across tasks, especially when they all consume the same input data, e.g., audio samples captured by the on-board microphones. In this paper we propose a multi-task model architecture that consists of a shared encoder and multiple task-specific adapters. During training, we learn the model parameters as well as the allocation of the task-specific additional resources across both tasks and layers. A global tuning parameter can be used to obtain different multi-task network configurations finding the desired trade-off between cost and the level of accuracy across tasks. Our results show that this solution significantly outperforms a multi-head model baseline. Interestingly, we observe that the optimal resource allocation depends on both the task intrinsic characteristics as well as on the targeted cost measure (e.g., memory or computing cost).