agogo

package module

v0.1.1 Latest Latest Go to latest Published: Jan 18, 2021 License: MIT Imports: 21 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/gorgonia/agogo

Links

Open Source Insights

README ¶

agogo

A reimplementation of AlphaGo in Go (specifically AlphaZero)

About

The algorithm is composed of:

a Monte-Carlo Tree Search (MCTS) implemented in the mcts package;
a Dual Neural Network (DNN) implemented in the dualnet package.

The algorithm is wrapped into a top-level structure (AZ for AlphaZero). The algorithm applies to any game able to fulfill a specified contract.

The contract specifies the description of a game state.

In this package, the contract is a Go interface declared in the game package: State.

Description of some concepts/ubiquitous language

In the agogo package, each player of the game is an Agent, and in a game, two Agents are playing in an Arena
The game package is loosely coupled with the AlphaZero algorithm and describes a game's behavior (and not what a game is). The behavior is expressed as a set of functions to operate on a State of the game. A State is an interface that represents the current game state as well as the allowed interactions. The interaction is made by an object Player who is operating a PlayerMove. The implementer's responsibility is to code the game's rules by creating an object that fulfills the State contract and implements the allowed moves.

Training process

Applying the Algo on a game

This package is designed to be extensible. Therefore you can train AlphaZero on any board game respecting the contract of the game package. Then, the model can be saved and used as a player.

The steps to train the algorithm are:

Creating a structure that is fulfilling the State interface (aka a game).
Creating a configuration for your AZ internal MCTS and NN.
Creating an AZ structure based on the game and the configuration
Executing the learning process (by calling the Learn method)
Saving the trained model (by calling the Save method)

The steps to play against the algorithm are:

Creating an AZ object
Loading the trained model (by calling the Read method)
Switching the agent to inference mode via the SwitchToInference method
Get the AI move by calling the Search method and applying the move to the game manually

Examples

Four board games are implemented so far. Each of them is defined as a subpackage of game:

mnk for m,n,k game.
wq is the game of Go (围碁)
c4
komi

tic-tac-toe

Tic-tac-toe is a m,n,k game where m=n=k=3.

Training

Here is a sample code that trains AlphaGo to play the game. The result is saved in a file example.model

// encodeBoard is a GameEncoder (https://pkg.go.dev/github.com/gorgonia/agogo#GameEncoder) for the tic-tac-toe
func encodeBoard(a game.State) []float32 {
     board := agogo.EncodeTwoPlayerBoard(a.Board(), nil)
     for i := range board {
     if board[i] == 0 {
          board[i] = 0.001
     }
     }
     playerLayer := make([]float32, len(a.Board()))
     next := a.ToMove()
     if next == game.Player(game.Black) {
     for i := range playerLayer {
          playerLayer[i] = 1
     }
     } else if next == game.Player(game.White) {
     // vecf32.Scale(board, -1)
     for i := range playerLayer {
          playerLayer[i] = -1
     }
     }
     retVal := append(board, playerLayer...)
     return retVal
}

func main() {
    // Create the configuration of the neural network
     conf := agogo.Config{
         Name:            "Tic Tac Toe",
         NNConf:          dual.DefaultConf(3, 3, 10),
         MCTSConf:        mcts.DefaultConfig(3),
         UpdateThreshold: 0.52,
     }
     conf.NNConf.BatchSize = 100
     conf.NNConf.Features = 2 // write a better encoding of the board, and increase features (and that allows you to increase K as well)
     conf.NNConf.K = 3
     conf.NNConf.SharedLayers = 3
     conf.MCTSConf = mcts.Config{
         PUCT:           1.0,
         M:              3,
         N:              3,
         Timeout:        100 * time.Millisecond,
         PassPreference: mcts.DontPreferPass,
         Budget:         1000,
         DumbPass:       true,
         RandomCount:    0,
     }

     conf.Encoder = encodeBoard

    // Create a new game
    g := mnk.TicTacToe()
    // Create the AlphaZero structure 
    a := agogo.New(g, conf)
    // Launch the learning process
    a.Learn(5, 30, 200, 30) // 5 epochs, 50 episode, 100 NN iters, 100 games.
    // Save the model
     a.Save("example.model")
}

Inference

func encodeBoard(a game.State) []float32 {
    board := agogo.EncodeTwoPlayerBoard(a.Board(), nil)
    for i := range board {
        if board[i] == 0 {
            board[i] = 0.001
        }
    }
    playerLayer := make([]float32, len(a.Board()))
    next := a.ToMove()
    if next == game.Player(game.Black) {
        for i := range playerLayer {
            playerLayer[i] = 1
        }
    } else if next == game.Player(game.White) {
        // vecf32.Scale(board, -1)
        for i := range playerLayer {
            playerLayer[i] = -1
        }
    }
    retVal := append(board, playerLayer...)
    return retVal
}

func main() {
    conf := agogo.Config{
        Name:     "Tic Tac Toe",
        NNConf:   dual.DefaultConf(3, 3, 10),
        MCTSConf: mcts.DefaultConfig(3),
    }
    conf.Encoder = encodeBoard

    g := mnk.TicTacToe()
    a := agogo.New(g, conf)
    a.Load("example.model")
    a.A.Player = mnk.Cross
    a.B.Player = mnk.Nought
    a.B.SwitchToInference(g)
    a.A.SwitchToInference(g)
    // Put x int the center
    stateAfterFirstPlay := g.Apply(game.PlayerMove{
        Player: mnk.Cross,
        Single: 4,
    })
    fmt.Println(stateAfterFirstPlay)
    // ⎢ · · · ⎥
    // ⎢ · X · ⎥
    // ⎢ · · · ⎥

    // What to do next
    move := a.B.Search(stateAfterFirstPlay)
    fmt.Println(move)
    // 1
    g.Apply(game.PlayerMove{
        Player: mnk.Nought,
        Single: move,
    })
    fmt.Println(stateAfterFirstPlay)
    // ⎢ · O · ⎥
    // ⎢ · X · ⎥
    // ⎢ · · · ⎥
}

Documentation ¶

Index ¶

func EncodeTwoPlayerBoard(a []game.Colour, prealloc []float32) []float32
func MakeIterator(board []float32, m, n int) (retVal [][]float32)
func ReturnIterator(m, n int, it [][]float32)
func RotateBoard(board []float32, m, n int) ([]float32, error)
func WQEncoder(a game.State) []float32
type AZ
- func New(g game.State, conf Config) *AZ
- func (a *AZ) Learn(iters, episodes, nniters, arenaGames int) error
- func (a *AZ) Load(filename string) error
- func (a *AZ) Save(filename string) error
- func (a *AZ) SelfPlay() []Example
type Agent
- func (a *Agent) Close() error
- func (a *Agent) Infer(g game.State) (policy []float32, value float32)
- func (a *Agent) NNOutput(g game.State) (policy []float32, value float32, err error)
- func (a *Agent) Search(g game.State) game.Single
- func (a *Agent) SwitchToInference(g game.State) (err error)
type Arena
- func MakeArena(g game.State, a, b Dualer, conf mcts.Config, enc GameEncoder, aug Augmenter, ...) Arena
- func NewArena(g game.State, a, b Dualer, conf mcts.Config, enc GameEncoder, aug Augmenter, ...) *Arena
- func (a *Arena) Epoch() int
- func (a *Arena) GameNumber() int
- func (a *Arena) Log(w io.Writer)
- func (a *Arena) Name() string
- func (a *Arena) Play(record bool, enc OutputEncoder, aug Augmenter) (winner game.Player, examples []Example)
- func (a *Arena) Score(p game.Player) float64
- func (a *Arena) State() game.State
type Augmenter
type Config
type Dualer
type Example
type ExecLogger
type GameEncoder
type Inferer
type OutputEncoder
type Statistics
- func (s *Statistics) Dump(filename string) error

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func EncodeTwoPlayerBoard ¶

func EncodeTwoPlayerBoard(a []game.Colour, prealloc []float32) []float32

EncodeTwoPlayerBoard encodes black as 1, white as -1 for each stone placed

func MakeIterator ¶

func MakeIterator(board []float32, m, n int) (retVal [][]float32)

MakeIterator makes a generic iterator of a board

func ReturnIterator ¶

func ReturnIterator(m, n int, it [][]float32)

func RotateBoard ¶

func RotateBoard(board []float32, m, n int) ([]float32, error)

func WQEncoder ¶

func WQEncoder(a game.State) []float32

WQEncoder encodes a Go board

Types ¶

type AZ ¶

type AZ struct {
	// state
	Arena
	Statistics
	// contains filtered or unexported fields
}

AZ is the top level structure and the entry point of the API. It it a wrapper around the MTCS and the NeeuralNework that composes the algorithm. AZ stands for AlphaZero

func New ¶

func New(g game.State, conf Config) *AZ

New AlphaZero structure. It takes a game state (implementing the board, rules, etc.) and a configuration to apply to the MCTS and the neural network

func (*AZ) Learn ¶

func (a *AZ) Learn(iters, episodes, nniters, arenaGames int) error

Learn learns for iters. It self-plays for episodes, and then trains a new NN from the self play example.

func (*AZ) Load ¶

func (a *AZ) Load(filename string) error

Load the Alpha Zero structure from a filename

func (*AZ) Save ¶

func (a *AZ) Save(filename string) error

Save learning into filenamee

func (*AZ) SelfPlay ¶

func (a *AZ) SelfPlay() []Example

SelfPlay plays an episode

type Agent ¶

type Agent struct {
	NN     *dual.Dual
	MCTS   *mcts.MCTS
	Player game.Player
	Enc    GameEncoder

	// Statistics
	Wins float32
	Loss float32
	Draw float32
	sync.Mutex
	// contains filtered or unexported fields
}

An Agent is a player, AI or Human

func (*Agent) Close ¶

func (a *Agent) Close() error

func (*Agent) Infer ¶

func (a *Agent) Infer(g game.State) (policy []float32, value float32)

Infer infers a bunch of moves based on the game state. This is mainly used to implement a Inferer such that the MCTS search can use it.

func (*Agent) NNOutput ¶

func (a *Agent) NNOutput(g game.State) (policy []float32, value float32, err error)

NNOutput returns the output of the neural network

func (*Agent) Search ¶

func (a *Agent) Search(g game.State) game.Single

Search searches the game state and returns a suggested coordinate.

func (*Agent) SwitchToInference ¶

func (a *Agent) SwitchToInference(g game.State) (err error)

SwitchToInference uses the inference mode neural network.

type Arena ¶

type Arena struct {
	A, B *Agent
	// contains filtered or unexported fields
}

Arena represents a game arena Arena fulfils the interface game.MetaState

func MakeArena ¶

func MakeArena(g game.State, a, b Dualer, conf mcts.Config, enc GameEncoder, aug Augmenter, name string) Arena

MakeArena makes an arena given a game.

func NewArena ¶

func NewArena(g game.State, a, b Dualer, conf mcts.Config, enc GameEncoder, aug Augmenter, name string) *Arena

NewArena makes an arena an returns a pointer to the Arena

func (*Arena) Epoch ¶

func (a *Arena) Epoch() int

Epoch returns the current Epoch

func (*Arena) GameNumber ¶

func (a *Arena) GameNumber() int

GameNumber returns the

func (*Arena) Log ¶

func (a *Arena) Log(w io.Writer)

Log the MCTS of both players into w

func (*Arena) Name ¶

func (a *Arena) Name() string

Name of the game

func (*Arena) Play ¶

func (a *Arena) Play(record bool, enc OutputEncoder, aug Augmenter) (winner game.Player, examples []Example)

Play plays a game, and retrns a winner. If it is a draw, the returned colour is None.

func (*Arena) Score ¶

func (a *Arena) Score(p game.Player) float64

Score of the player p

func (*Arena) State ¶

func (a *Arena) State() game.State

State of the game

type Augmenter ¶

type Augmenter func(a Example) []Example

Augmenter takes an example, and creates more examples from it.

type Config ¶

type Config struct {
	Name            string
	NNConf          dual.Config
	MCTSConf        mcts.Config
	UpdateThreshold float64
	MaxExamples     int // maximum number of examples

	// extensions
	Encoder       GameEncoder
	OutputEncoder OutputEncoder
	Augmenter     Augmenter
}

Config for the AZ structure. It holds attributes that impacts the MCTS and the Neural Network as well as object that facilitates the interactions with the end-user (eg: OutputEncoder).

type Dualer ¶

type Dualer interface {
	Dual() *dual.Dual
}

Dualer is an interface for anything that allows getting out a *Dual.

Its sole purpose is to form a monoid-ish data structure for Agent.NN

type Example ¶

type Example struct {
	Board  []float32
	Policy []float32
	Value  float32
}

Example is a representation of an example.

type ExecLogger ¶

type ExecLogger interface {
	ExecLog() string
}

ExecLogger is anything that can return the execution log.

type GameEncoder ¶

type GameEncoder func(a game.State) []float32

GameEncoder encodes a game state as a slice of floats

type Inferer ¶

type Inferer interface {
	Infer(a []float32) (policy []float32, value float32, err error)
	io.Closer
}

Inferer is anything that can infer given an input.

type OutputEncoder ¶

type OutputEncoder interface {
	Encode(ms game.MetaState) error
	Flush() error
}

OutputEncoder encodes the entire meta state as whatever.

An example OutputEncoder is the GifEncoder. Another example would be a logger.

type Statistics ¶

type Statistics struct {
	Creation []string
	Wins     map[string][]float32
	Losses   map[string][]float32
	Draws    map[string][]float32
}

func (*Statistics) Dump ¶

func (s *Statistics) Dump(filename string) error

Dump the statistics in filename using a CSV format

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
tictactoe
dualnet
game
c4
komi
mnk
wq package 围碁 implements Go (the board game) related code 围碁 is a bastardized word.	package 围碁 implements Go (the board game) related code 围碁 is a bastardized word.
internal
gtp
online
mcts

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL