Optimization of a machine learning model is typically carried out by performing stochastic gradient updates on epochs that consist of randomly ordered training examples. This practice means that eachfraction of an epoch comprises an independent random sample of the training data that may not preserve informative structure present in the full data. We hypothesize that the training can be more effective, allowing each epoch to provide some of the benefits of multiple ones, with more principled, ``self-similar'' arrangements.
Our case study is matrix factorization, commonly used to learn metric embeddings of entities such as videos or words from example associations. We construct arrangements that preserve the weighted Jaccard similarities of rows and columns and experimentally observe that our arrangements yield training acceleration of 3\%-30\% on synthetic and recommendation datasets. Principled arrangements of training examples emerge as a novel and potentially powerful performance knob for SGD that merits further exploration.