A capsule is a group of neurons whose outputs represent different properties of the same entity. We describe a version of capsules in which each capsule has a logistic unit to represent the presence of an entity and a 4x4 pose matrix to represent the relationship between that entity and the viewer. A capsule in one layer votes for the pose matrices of many different capsules in the layer above by multiplying its own pose matrix by viewpoint-invariant transformation matrices that represent part-whole relationships. Each of these votes is weighted by an assignment coefficient and these coefficients are iteratively updated using the EM algorithm so that the output of each capsule is routed to a capsule in the layer above that receives a cluster of similar votes. The whole system is trained discriminatively by unrolling the 3 iterations of EM between each pair of adjacent layers. On the small NORB benchmark, capsules reduce the number of test errors by 30\% compared with the best reported CNN. Capsules are also far more resistant to whitebox adversarial attack.