Can Large Language Models Explain Their Own Thinking?
Abstract
An Explorable explaining the concept of patchoscopes for an external audience. Patchoscopes is an interpretability tool that allows researchers to better understand an LLMs output representations through natural language experiments.