ShihanRan's Blog Life is like a Markov Chain.

AI project - Gomoku Game Agent


This is a note about implementing the final project of my Artificial Intelligence course. The scope of this project is develop an AI agent for Gomoku.

Gomoku, also know as five-in-a-row, is a strategy board game which is traditionally played with Go pieces on a go board with 15 × 15 intersections. The rules for Gomoku are similar to Go, where two players place stones alternatively on the intersections of the board. The winner is the first player to get 5 of their stones in a row, either vertically, horizontally or diagonally.

Project Sets

Rules

  • Gomoku Rule: Free Gomoku Rule(five or more)
  1. Free-style gomoku requires a row of five or more stones for a win.
  2. In any size of a board, freestyle gomoku is an m,n,k-game, and it is known that the second player does not win. With perfect play, either the first player wins or the result is a draw.
  • Balanced opening from a set: AI and us has the same opportunity to be the first player.

  • The board has 20x20 squares

  • 15 seconds per move, 90 seconds per match for maximum

  • 10 fixed AI, 12 matches with each one

Restrictions

  • No more than 3 people a team
  • Upload to test once a day
  • Uploading file may not take more than 5 MB of space (zipped together with all necessary files) named as id.zip
  • Only one CPU core
  • No deep neural network (shallow is okay but needs to be less than 5 MB)

Paper Note

CS221 Project Final Report Gomoku Game Agent

Source Code: Github here

Methods

Take away message

  • Tree search based approach with alpha beta pruning implemented
  • focus on developing good heuristic functions which are mainly inspired by human players strategy as evaluation function to speed up the computation

Main idea to obtain an optimal agent

  • Extract more features by expert knowledge
  • Leverage advanced regularization skills for temporal difference (TD)-Learning

ADP with MCTS Algorithm for Gomoku

Methods

Take away message

  • We combine a shallow neural network, which is trained by Adaptive Dynamic Programming (ADP), with Monte Carlo Tree Search (MCTS) algorithm for Gomoku.
  • MCTS algorithm is based on Monte Carlo simulation method, which goes through lots of simulations and generates a game search tree.
  • The ADP and MCTS methods are used to estimate the winning rates respectively. We weight the two winning rates to select the action position with the maximum one.
  • Based on the min-max tree combined with a evaluation function of leaf board situations
  • Use alpha-beta pruning to speed up the game tree search
  • Defect
    • Static evaluation function always requires complicated artificial design and it needs a lot of time to consider plenty of situations
    • It can not learn anything while playing Gomoku. It just obeys the rule which is made before, and could not be improved by playing.
    • Time and space complexity growing exponentially with search depth

Adaptive Dynamic Programming (ADP)

  • Core idea: The action decision or the value function can also be described in a continuous form, approximated by a nonlinear function in neural networks.
  • Structure:
  • Defect
    • Only effective for those games with available expert knowledge because its input was designed by pre-determined pattern
    • Also, it has a short-sighted defect for neural network evaluation function.

Monte Carlo Tree Search (MCTS)

  • Core idea: Taking random simulations in the decision space and building a search tree according to the results.
  • Four stage:
    • Selection: Begins from the root node, and recursively applies the child selection policy (also known as Tree Policy) to descend through the tree until it reaches the most urgent expandable node.
    • Expansion: Add one or more child nodes to expand the tree
    • Simulation: Run a simulation from the leaf node based on the settled policy (or called Default Policy) to produce an outcome
    • Backpropagation: Back propagate the simulation result through the selected nodes to update their state values.
  • Defect
    • needs some complex domain knowledge additionally
    • must spend lots of time in simulation to get a satisfactory result

ADP-MCTS

  • Process
    • Obtain 5 candidate moves and their winning probabilities from the neural network which is trained by ADP. We call them ADP winning probabilities.
    • The 5 candidate moves and their situations of board are seen as the root node of MCTS. Then, we obtain 5 winning probabilities respectively from MCTS method. We call them MCTS winning probabilities.
    • To get a more accurate prediction of winning probability, we calculate the weighted sum of ADP and its corresponding winning probability of MCTS: $w_p=\lambda w_1 + (1-\lambda)w_2$
  • Advantage
    • Save time: the candidate moves obtained from ADP make the MCTS’s search space smaller than before

Effective Monte-Carlo Tree Search Strategies for Gomoku AI

Methods

Take away message

  • Combined with progressive strategies used in Go program MANGO but uses different heuristic knowledge and roll-out policy

PROGRAMMING DETAIL

Environment Setup

  • Python code templates: pbrain pyrandom
    • Include installing some python package
    • Description of how to use .py to generate your own AI
  • Download Gomocup Manager
    • It allows you to play against Gomocup AIs
  • Download some AI

Source code

  • Edit example.py to design your own AI, and strategies are written in function brain_turn()

Reference


Similar Posts

Comments