qlearning

package module
v0.0.0-...-09709ec Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 22, 2020 License: MIT Imports: 7 Imported by: 0

README

qlearning

The qlearning package provides a series of interfaces and utilities to implement the Q-Learning algorithm in Go.

This project was largely inspired by flappybird-qlearning- bot.

Until a release is tagged, qlearning should be considered highly experimental and mostly a fun toy.

Some refactor, add ability for store q-table to file add ability for not only randomly select next action. Add another example, resolving nqueen problem(with some reservations)

Installation

$ go get https://github.com/temorfeouz/qlearning

Quickstart

qlearning provides example implementations in the examples directory of the project.

hangman.go provides a naive implementation of Hangman for use with qlearning.

$ cd $GOPATH/src/github.com/temorfeouz/qlearning/examples
$ go run hangman.go -h
Usage of hangman:
  -debug
        Set debug
  -games int
        Play N games (default 5000000)
  -progress int
        Print progress messages every N games (default 1000)
  -wordlist string
        Path to a wordlist (default "./wordlist.txt")
  -words int
        Use N words from wordlist (default 10000)

By default, running hangman.go will play millions of games against a 10,000-word corpus. That's a bit overkill for just trying out qlearning. You can run it against a smaller number of words for a few number of games using the -games and -words flags.

$ go run hangman.go -words 100 -progress 1000 -games 5000
100 words loaded
1000 games played: 92 WINS 908 LOSSES 9% WIN RATE
2000 games played: 447 WINS 1553 LOSSES 36% WIN RATE
3000 games played: 1064 WINS 1936 LOSSES 62% WIN RATE
4000 games played: 1913 WINS 2087 LOSSES 85% WIN RATE
5000 games played: 2845 WINS 2155 LOSSES 93% WIN RATE

Agent performance: 5000 games played, 2845 WINS 2155 LOSSES 57% WIN RATE

"WIN RATE" per progress report is isolated within that cycle, a group of 1000 games in this example. The win rate is meant to show the velocity of learning by the agent. If it is "learning", the win rate should be increasing until reaching convergence.

As you can see, after 5000 games, the agent is able to "learn" and play hangman against a 100-word vocabulary.

Usage

See godocs for the package documentation.

Documentation

Overview

Package qlearning is an experimental set of interfaces and helpers to implement the Q-learning algorithm in Go.

This is highly experimental and should be considered a toy.

See https://github.com/temorfeouz/qlearning/tree/master/examples for implementation examples.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Action

type Action interface {
	String() string
	Apply(State) State
}

Action is an interface wrapping an action that can be applied to the model's current state.

BUG (temorfeouz): A state should apply an action, not the other way around.

type Agent

type Agent interface {
	// Learn updates the model for a given state and action, using the
	// provided Rewarder implementation.
	Learn(*StateAction, Rewarder)

	// Value returns the current Q-value for a State and Action.
	Value(State, Action) float32

	// Return a string representation of the Agent.
	String() string
}

Agent is an interface for a model's agent and is able to learn from actions and return the current Q-value of an action at a given state.

type Rewarder

type Rewarder interface {
	// Reward calculates the reward value for a given action in a given
	// state.
	Reward(action *StateAction) float32
}

Rewarder is an interface wrapping the ability to provide a reward for the execution of an action in a given state.

type SimpleAgent

type SimpleAgent struct {
	// contains filtered or unexported fields
}

SimpleAgent is an Agent implementation that stores Q-values in a map of maps.

func NewSimpleAgent

func NewSimpleAgent(lr, d float32) *SimpleAgent

NewSimpleAgent creates a SimpleAgent with the provided learning rate and discount factor.

func (*SimpleAgent) Export

func (agent *SimpleAgent) Export(w io.Writer)

func (*SimpleAgent) Import

func (agent *SimpleAgent) Import(r io.Reader)

func (*SimpleAgent) Learn

func (agent *SimpleAgent) Learn(action *StateAction, reward Rewarder)

Learn updates the existing Q-value for the given State and Action using the Rewarder.

See https://en.wikipedia.org/wiki/Q-learning#Algorithm

func (*SimpleAgent) String

func (agent *SimpleAgent) String() string

String returns the current Q-value map as a printed string.

BUG (temorfeouz): This is useless.

func (*SimpleAgent) Value

func (agent *SimpleAgent) Value(state State, action Action) float32

Value gets the current Q-value for a State and Action.

type State

type State interface {

	// String returns a string representation of the given state.
	// Implementers should take care to insure that this is a consistent
	// hash for a given state.
	String() string

	// Next provides a slice of possible Actions that could be applied to
	// a state.
	Next() []Action
}

State is an interface wrapping the current state of the model.

type StateAction

type StateAction struct {
	State  State
	Action Action
	Value  float32
}

StateAction is a struct grouping an action to a given State. Additionally, a Value can be associated to StateAction, which is typically the Q-value.

func NewStateAction

func NewStateAction(state State, action Action, val float32) *StateAction

NewStateAction creates a new StateAction for a State and Action.

func Next

func Next(agent Agent, state State, epsilon float32) *StateAction

Next uses an Agent and State to find the highest scored Action.

In the case of Q-value ties for a set of actions, a random value is selected.

Directories

Path Synopsis
examples
hangman
An example implementation the qlearning interfaces.
An example implementation the qlearning interfaces.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL