We propose a new model for grammatical error correction (GEC) which builds on a very large multilingual masked language model, covering 101 languages. To adapt our model for the GEC task, we design an unsupervised, language-agnostic pretraining objective that mimics corrections typically contained in labeled data. After finetuning on gold data, we surpass the previous state-of-the-art results on the four evaluated languages (Czech, English, German and Russian). This approach shows the power of large multilingual language models. Due to these models being non-trivial to run on non-cluster infrastructure, we employ our model to clean up the labels in the popular yet noisy Lang-8 dataset. We release this dataset and hope that the community will find it useful for further advancement of GEC.