Most deep architectures for image classification – even those that are trained to classify a large number of diverse categories – learn shared image representations with a single combined model. Intuitively, however, categories that are more visually similar should share more information than those that are very different. While hierarchical deep networks address this problem by learning separate features for subsets of related categories, current implementations require simplified models using fixed architectures specified with heuristic clustering methods. Instead, we propose Blockout, a method for regularization and model selection that simultaneously learns both the model architecture and parameters jointly with end-to-end training. Inspired by dropout, our approach gives a novel parametrization of hierarchical architectures that allows for structure learning using simple back-propagation. To demonstrate the utility of our approach, we evaluate Blockout on the CIFAR and ImageNet datasets demonstrating improved classification accuracy, better regularization performance, faster training, and a clear separation of nodes into hierarchical structures.