agogo

package module
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 18, 2021 License: MIT Imports: 21 Imported by: 0

README

agogo

A reimplementation of AlphaGo in Go (specifically AlphaZero)

About

The algorithm is composed of:

  • a Monte-Carlo Tree Search (MCTS) implemented in the mcts package;
  • a Dual Neural Network (DNN) implemented in the dualnet package.

The algorithm is wrapped into a top-level structure (AZ for AlphaZero). The algorithm applies to any game able to fulfill a specified contract.

The contract specifies the description of a game state.

In this package, the contract is a Go interface declared in the game package: State.

Description of some concepts/ubiquitous language
  • In the agogo package, each player of the game is an Agent, and in a game, two Agents are playing in an Arena

  • The game package is loosely coupled with the AlphaZero algorithm and describes a game's behavior (and not what a game is). The behavior is expressed as a set of functions to operate on a State of the game. A State is an interface that represents the current game state as well as the allowed interactions. The interaction is made by an object Player who is operating a PlayerMove. The implementer's responsibility is to code the game's rules by creating an object that fulfills the State contract and implements the allowed moves.

Training process
Applying the Algo on a game

This package is designed to be extensible. Therefore you can train AlphaZero on any board game respecting the contract of the game package. Then, the model can be saved and used as a player.

The steps to train the algorithm are:

  • Creating a structure that is fulfilling the State interface (aka a game).
  • Creating a configuration for your AZ internal MCTS and NN.
  • Creating an AZ structure based on the game and the configuration
  • Executing the learning process (by calling the Learn method)
  • Saving the trained model (by calling the Save method)

The steps to play against the algorithm are:

  • Creating an AZ object
  • Loading the trained model (by calling the Read method)
  • Switching the agent to inference mode via the SwitchToInference method
  • Get the AI move by calling the Search method and applying the move to the game manually

Examples

Four board games are implemented so far. Each of them is defined as a subpackage of game:

tic-tac-toe

Tic-tac-toe is a m,n,k game where m=n=k=3.

Training

Here is a sample code that trains AlphaGo to play the game. The result is saved in a file example.model

// encodeBoard is a GameEncoder (https://pkg.go.dev/github.com/gorgonia/agogo#GameEncoder) for the tic-tac-toe
func encodeBoard(a game.State) []float32 {
     board := agogo.EncodeTwoPlayerBoard(a.Board(), nil)
     for i := range board {
     if board[i] == 0 {
          board[i] = 0.001
     }
     }
     playerLayer := make([]float32, len(a.Board()))
     next := a.ToMove()
     if next == game.Player(game.Black) {
     for i := range playerLayer {
          playerLayer[i] = 1
     }
     } else if next == game.Player(game.White) {
     // vecf32.Scale(board, -1)
     for i := range playerLayer {
          playerLayer[i] = -1
     }
     }
     retVal := append(board, playerLayer...)
     return retVal
}

func main() {
    // Create the configuration of the neural network
     conf := agogo.Config{
         Name:            "Tic Tac Toe",
         NNConf:          dual.DefaultConf(3, 3, 10),
         MCTSConf:        mcts.DefaultConfig(3),
         UpdateThreshold: 0.52,
     }
     conf.NNConf.BatchSize = 100
     conf.NNConf.Features = 2 // write a better encoding of the board, and increase features (and that allows you to increase K as well)
     conf.NNConf.K = 3
     conf.NNConf.SharedLayers = 3
     conf.MCTSConf = mcts.Config{
         PUCT:           1.0,
         M:              3,
         N:              3,
         Timeout:        100 * time.Millisecond,
         PassPreference: mcts.DontPreferPass,
         Budget:         1000,
         DumbPass:       true,
         RandomCount:    0,
     }

     conf.Encoder = encodeBoard

    // Create a new game
    g := mnk.TicTacToe()
    // Create the AlphaZero structure 
    a := agogo.New(g, conf)
    // Launch the learning process
    a.Learn(5, 30, 200, 30) // 5 epochs, 50 episode, 100 NN iters, 100 games.
    // Save the model
     a.Save("example.model")
}
Inference
func encodeBoard(a game.State) []float32 {
    board := agogo.EncodeTwoPlayerBoard(a.Board(), nil)
    for i := range board {
        if board[i] == 0 {
            board[i] = 0.001
        }
    }
    playerLayer := make([]float32, len(a.Board()))
    next := a.ToMove()
    if next == game.Player(game.Black) {
        for i := range playerLayer {
            playerLayer[i] = 1
        }
    } else if next == game.Player(game.White) {
        // vecf32.Scale(board, -1)
        for i := range playerLayer {
            playerLayer[i] = -1
        }
    }
    retVal := append(board, playerLayer...)
    return retVal
}

func main() {
    conf := agogo.Config{
        Name:     "Tic Tac Toe",
        NNConf:   dual.DefaultConf(3, 3, 10),
        MCTSConf: mcts.DefaultConfig(3),
    }
    conf.Encoder = encodeBoard

    g := mnk.TicTacToe()
    a := agogo.New(g, conf)
    a.Load("example.model")
    a.A.Player = mnk.Cross
    a.B.Player = mnk.Nought
    a.B.SwitchToInference(g)
    a.A.SwitchToInference(g)
    // Put x int the center
    stateAfterFirstPlay := g.Apply(game.PlayerMove{
        Player: mnk.Cross,
        Single: 4,
    })
    fmt.Println(stateAfterFirstPlay)
    // ⎢ · · · ⎥
    // ⎢ · X · ⎥
    // ⎢ · · · ⎥

    // What to do next
    move := a.B.Search(stateAfterFirstPlay)
    fmt.Println(move)
    // 1
    g.Apply(game.PlayerMove{
        Player: mnk.Nought,
        Single: move,
    })
    fmt.Println(stateAfterFirstPlay)
    // ⎢ · O · ⎥
    // ⎢ · X · ⎥
    // ⎢ · · · ⎥
}

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func EncodeTwoPlayerBoard

func EncodeTwoPlayerBoard(a []game.Colour, prealloc []float32) []float32

EncodeTwoPlayerBoard encodes black as 1, white as -1 for each stone placed

func MakeIterator

func MakeIterator(board []float32, m, n int) (retVal [][]float32)

MakeIterator makes a generic iterator of a board

func ReturnIterator

func ReturnIterator(m, n int, it [][]float32)

func RotateBoard

func RotateBoard(board []float32, m, n int) ([]float32, error)

func WQEncoder

func WQEncoder(a game.State) []float32

WQEncoder encodes a Go board

Types

type AZ

type AZ struct {
	// state
	Arena
	Statistics
	// contains filtered or unexported fields
}

AZ is the top level structure and the entry point of the API. It it a wrapper around the MTCS and the NeeuralNework that composes the algorithm. AZ stands for AlphaZero

func New

func New(g game.State, conf Config) *AZ

New AlphaZero structure. It takes a game state (implementing the board, rules, etc.) and a configuration to apply to the MCTS and the neural network

func (*AZ) Learn

func (a *AZ) Learn(iters, episodes, nniters, arenaGames int) error

Learn learns for iters. It self-plays for episodes, and then trains a new NN from the self play example.

func (*AZ) Load

func (a *AZ) Load(filename string) error

Load the Alpha Zero structure from a filename

func (*AZ) Save

func (a *AZ) Save(filename string) error

Save learning into filenamee

func (*AZ) SelfPlay

func (a *AZ) SelfPlay() []Example

SelfPlay plays an episode

type Agent

type Agent struct {
	NN     *dual.Dual
	MCTS   *mcts.MCTS
	Player game.Player
	Enc    GameEncoder

	// Statistics
	Wins float32
	Loss float32
	Draw float32
	sync.Mutex
	// contains filtered or unexported fields
}

An Agent is a player, AI or Human

func (*Agent) Close

func (a *Agent) Close() error

func (*Agent) Infer

func (a *Agent) Infer(g game.State) (policy []float32, value float32)

Infer infers a bunch of moves based on the game state. This is mainly used to implement a Inferer such that the MCTS search can use it.

func (*Agent) NNOutput

func (a *Agent) NNOutput(g game.State) (policy []float32, value float32, err error)

NNOutput returns the output of the neural network

func (*Agent) Search

func (a *Agent) Search(g game.State) game.Single

Search searches the game state and returns a suggested coordinate.

func (*Agent) SwitchToInference

func (a *Agent) SwitchToInference(g game.State) (err error)

SwitchToInference uses the inference mode neural network.

type Arena

type Arena struct {
	A, B *Agent
	// contains filtered or unexported fields
}

Arena represents a game arena Arena fulfils the interface game.MetaState

func MakeArena

func MakeArena(g game.State, a, b Dualer, conf mcts.Config, enc GameEncoder, aug Augmenter, name string) Arena

MakeArena makes an arena given a game.

func NewArena

func NewArena(g game.State, a, b Dualer, conf mcts.Config, enc GameEncoder, aug Augmenter, name string) *Arena

NewArena makes an arena an returns a pointer to the Arena

func (*Arena) Epoch

func (a *Arena) Epoch() int

Epoch returns the current Epoch

func (*Arena) GameNumber

func (a *Arena) GameNumber() int

GameNumber returns the

func (*Arena) Log

func (a *Arena) Log(w io.Writer)

Log the MCTS of both players into w

func (*Arena) Name

func (a *Arena) Name() string

Name of the game

func (*Arena) Play

func (a *Arena) Play(record bool, enc OutputEncoder, aug Augmenter) (winner game.Player, examples []Example)

Play plays a game, and retrns a winner. If it is a draw, the returned colour is None.

func (*Arena) Score

func (a *Arena) Score(p game.Player) float64

Score of the player p

func (*Arena) State

func (a *Arena) State() game.State

State of the game

type Augmenter

type Augmenter func(a Example) []Example

Augmenter takes an example, and creates more examples from it.

type Config

type Config struct {
	Name            string
	NNConf          dual.Config
	MCTSConf        mcts.Config
	UpdateThreshold float64
	MaxExamples     int // maximum number of examples

	// extensions
	Encoder       GameEncoder
	OutputEncoder OutputEncoder
	Augmenter     Augmenter
}

Config for the AZ structure. It holds attributes that impacts the MCTS and the Neural Network as well as object that facilitates the interactions with the end-user (eg: OutputEncoder).

type Dualer

type Dualer interface {
	Dual() *dual.Dual
}

Dualer is an interface for anything that allows getting out a *Dual.

Its sole purpose is to form a monoid-ish data structure for Agent.NN

type Example

type Example struct {
	Board  []float32
	Policy []float32
	Value  float32
}

Example is a representation of an example.

type ExecLogger

type ExecLogger interface {
	ExecLog() string
}

ExecLogger is anything that can return the execution log.

type GameEncoder

type GameEncoder func(a game.State) []float32

GameEncoder encodes a game state as a slice of floats

type Inferer

type Inferer interface {
	Infer(a []float32) (policy []float32, value float32, err error)
	io.Closer
}

Inferer is anything that can infer given an input.

type OutputEncoder

type OutputEncoder interface {
	Encode(ms game.MetaState) error
	Flush() error
}

OutputEncoder encodes the entire meta state as whatever.

An example OutputEncoder is the GifEncoder. Another example would be a logger.

type Statistics

type Statistics struct {
	Creation []string
	Wins     map[string][]float32
	Losses   map[string][]float32
	Draws    map[string][]float32
}

func (*Statistics) Dump

func (s *Statistics) Dump(filename string) error

Dump the statistics in filename using a CSV format

Directories

Path Synopsis
cmd
c4
mnk
wq
package 围碁 implements Go (the board game) related code 围碁 is a bastardized word.
package 围碁 implements Go (the board game) related code 围碁 is a bastardized word.
internal
gtp

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL