Large generative language models such as GPT-2 are well-known for not only their ability to generate highly realistic text but also in their utility for common downstream tasks. However, how and in what settings one can best leverage these powerful language models is still a nascent research question. In this work, we explore their use in predicting ``language quality'', a notion of coherence and understandability of text. Our key finding is that, when trained in a self-discriminating fashion, large language models emerge as unsupervised predictors for such language quality. This enables fast bootstrapping of quality indicators in a low-resource setting. We conduct extensive qualitative and quantitative analysis over 500 million web articles, the largest-scale study conducted on this topic.