Routing Networks with Co-training for Continual Learning
Abstract
Many continual learning methods can be characterized as either altering the learning algorithm in a fixed capacity neural network or dynamically growing the capacity of the network to handle new tasks. We propose to use fixed capacity sparse routing networks for continual learning. We retain the advantages of architectural solutions to the continual learning problem, in that different paths through the network can be learned for different tasks. However, we stay within the regime of fixed capacity networks which are more realistic for real-world use cases. We find it is necessary to develop a new training method for routing networks, which we call co-training which avoids poorly initialized experts when new tasks are presented. In initial experiments, when combined with a small episodic memory replay buffer, sparse routing networks with co-training outperform densely connected networks on the MNIST-Permutations and MNIST-Rotations benchmarks.