Motivated by the need for distributed optimization algorithms with low communication cost, we study communication efficient algorithms to perform distributed mean estimation. We study the scenarios in which each client sends one bit per dimension. We first show that for d dimensional data with n clients, a naive stochastic rounding approach yields a mean squared error Theta(d/n). We then show by applying a structured random rotation of the data (an O(dlogd) algorithm), the error can be reduced to O(logd/n). The methods we show in this paper do not depend on the distribution of the data.