Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers
In theorem proving, the problem of selecting useful premises that unlock the proof of a given conjecture from a large library is crucially important. This presents a challenge for all theorem provers, including the ones based on language models. This paper introduces Thor, a framework integrating language models and automated theorem provers to overcome this difficulty. In Thor, a class of methods called hammers that leverages the power of automated theorem provers is used for premise selection, while all other tasks are designated to language models. With Thor, we improve the state-of-the-art success rate on the PISA dataset from 33.2\% to 57.0\%, solving 8.2\% of problems neither language models nor automated theorem provers are able to solve on their own. Furthermore, with a significantly smaller computational budget, Thor can achieve a success rate on the MiniF2F dataset that is on par with the best existing methods. In addition, we provide a straightforward protocol to instantiate Thor for any proof system where the hammer method has been implemented.