Federated learning has been applied to train machine learning models from decentralized client data on mobile devices in practice. The population of the large scale clients are observed to have periodically shifting distributions, which can cause instability in training and degrade the final model performance. In this paper, instead of adopting the block-cyclic distribution shifts in previous papers, we model the population distribution to be a mixture distribution gradually changing between daytime subpopulation and nighttime subpopulation. We verified this intuitive modification better matches the training observation in practical federated learning systems. We propose multi-branch networks to handle the domain differences in subpopulations, and exploit a federated Expectation-Maximization (EM) algorithm with temporal priors to select branches for each client to handle the distribution shift. Experiments for image classification on EMNIST and CIFAR datasets, and next word prediction on the Stack Overflow dataset show that the proposed algorithm can effectively mitigate the impact of the distribution shift and significantly improve the final model performance.