A Rate--Distortion View on Model Updates
Abstract
Compressing model updates is critical for reducing communication costs in federated learning. We examine the problem using rate--distortion theory to present a compression method that is near-optimal in many use cases. We empirically show that common transforms applied to model updates in standard compression algorithms, normalization in QSGD and random rotation in DRIVE, yield sub-optimal compressed representations in practice.