Tracing the Representation Geometry of Language Models from Pretraining to Post-training

Guillaume Lajoie
Arna Ghosh
Kumar Krishna Agrawal
Komal Kumar Teru
Blake Richards
Melody Zixuan Li
Adam Santoro
2025

Abstract

Complex representational changes in Large Language Models (LLMs) are critical for their capabilities, but are often obscured by standard metrics used to evaluate models during training, like loss or gradient norms. Here, we examine the representational changes that occur during LLM pretraining by analyzing their high-dimensional representation geometry using spectral methods (αReQ, RankMe). In two different families of models (OLMo and Pythia), hidden beneath the near monotonically-decreasing loss and gradient norm, we uncover non-monotonic learning phases in the geometry of the representations. These phases are curves. Specifically, we find that the pretraining stage consistently exhibits three distinct phases: (1) a ‘warm-up’ phase where the dimensionality of the representations drops drastically, (2) an ’entropy-seeking’ phase that expands the effective dimensionality of the representations in all directions, and (3) a ’compression-seeking’ phase that reduces the dimensionality by selectively expanding only along the dominant representational axes. This evolving representation geometry governs the trade-off between fitting the training distribution and generalizing beyond it: The models get better at reproducing specific short-context sequences from the data during the entropy-seeking phase, and at generalizing to novel long-context dependencies during the compression-seeking phase. Continued pretraining can lead to additional entropy-seeking and compression-seeking phases. Crucially, we also find that these different phases have implications for downstream fine-tuning. Optimal adaptability for Supervised Fine-Tuning (SFT) emerges significantly earlier than peak zero-shot performance on factual question answering tasks and aligns with the transition out of the first compression-seeking phase. Furthermore, we observe that SFT often induces an ’entropy-seeking’ dynamic whereas Reinforcement Learning from Verifiable Rewards (RLVR) induces a ’compression-seeking’ dynamic. We investigate the implications of these representational dynamics on downstream generalization of instruction-tuning, and exploration capabilities of RLVR-tuned models. Our results demonstrate that spectral methods for analyzing high-dimensional representations can provide new insights on the functionally relevant changes that occur in LLMs over pretraining.