Google Research

Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers

  • Albert Jiang
  • Konrad Czechowski
  • Mateja Jamnik
  • Piotr Milos
  • Szymon Tworkowski
  • Wenda Li
  • Yuhuai Tony Wu
NeurIPS (2022) (to appear)


In theorem proving, the problem of selecting useful premises that unlock the proof of a given conjecture from a large library is crucially important. This presents a challenge for all theorem provers, including the ones based on language models. This paper introduces Thor, a framework integrating language models and automated theorem provers to overcome this difficulty. In Thor, a class of methods called hammers that leverages the power of automated theorem provers is used for premise selection, while all other tasks are designated to language models. With Thor, we improve the state-of-the-art success rate on the PISA dataset from 33.2\% to 57.0\%, solving 8.2\% of problems neither language models nor automated theorem provers are able to solve on their own. Furthermore, with a significantly smaller computational budget, Thor can achieve a success rate on the MiniF2F dataset that is on par with the best existing methods. In addition, we provide a straightforward protocol to instantiate Thor for any proof system where the hammer method has been implemented.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work