Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision

Jingbo Shang
Eric Li
Liangchen Luo
Le Hou
Yuexin Wu
Zihan Wang
2024

Abstract

Large language models often struggle to generate error-free solutions in complex problem-solving scenarios. To address this, recent advancements have adopted a reasoner-verifier framework, where a verifier model evaluates the intermediate solution steps created by a reasoning model. However, obtaining the necessary intermediate annotations, alias process supervision data, to train the verifier model is resource-intensive and expensive. In this paper, we introduce Model-induced Process Supervision ( MiPS), a novel method for automating data curation. MiPS leverages the reasoner to generate process supervision data based on a Monte Carlo approach to sample the accuracy of intermediate solution completions from a training set. Our approach significantly improves the performance of PaLM 2 on math and coding tasks (accuracy +0.67% on GSM8K, +4.16% on MATH, +0.92% on MBPP compared with an output verifier). We address the noise in MiPS through an empirical analysis and suggest deligent choices in the training objective and the step-aggregation function for the verifier.
×