Achieving Master Level Play in 9 x 9 Computer Go

Sylvain Gelly; David Silver

Achieving Master Level Play in 9 x 9 Computer Go

Sylvain Gelly

David Silver

AAAI, 8 (2008), pp. 1537-1540

Download Google Scholar

Abstract

The UCT algorithm uses Monte-Carlo simulation to estimate the value of states in a search tree from the current state. However, the first time a state is encountered, UCT has no knowledge, and is unable to generalise from previous experience. We describe two extensions that address these weaknesses. Our first algorithm, heuristic UCT, incorporates prior knowledge in the form of a value function. The value function can be learned offline, using a linear combination of a million binary features, with weights trained by temporal-difference learning. Our second algorithm, UCT–RAVE, forms a rapid online generalisation based on the value of moves. We applied our algorithms to the domain of 9 × 9 Computer Go, using the program MoGo. Using both heuristic UCT and RAVE, MoGo became the first program to achieve human master level in competitive play.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Achieving Master Level Play in 9 x 9 Computer Go

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs