deepqlearn

package

v1.1.2 Latest Latest Go to latest Published: May 6, 2021 License: MIT Imports: 6 Imported by: 1

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/BenLubar/convnet

Links

Open Source Insights

Documentation ¶

Index ¶

Variables
type Brain
- func NewBrain(numStates, numActions int, opt BrainOptions) (*Brain, error)
type BrainOptions
type Experience

Constants ¶

This section is empty.

Variables ¶

View Source

var DefaultBrainOptions = BrainOptions{
	TemporalWindow:           1,
	ExperienceSize:           30000,
	StartLearnThreshold:      int(math.Floor(math.Min(30000*0.1, 1000))),
	Gamma:                    0.8,
	LearningStepsTotal:       100000,
	LearningStepsBurnin:      3000,
	EpsilonMin:               0.05,
	EpsilonTestTime:          0.01,
	RandomActionDistribution: nil,
	TDTrainerOptions: convnet.TrainerOptions{
		LearningRate: 0.01,
		Momentum:     0.0,
		BatchSize:    64,
		L2Decay:      0.01,
	},
}

Functions ¶

This section is empty.

Types ¶

type Brain ¶

type Brain struct {
	TemporalWindow           int
	ExperienceSize           int
	StartLearnThreshold      int
	Gamma                    float64
	LearningStepsTotal       int
	LearningStepsBurnin      int
	EpsilonMin               float64
	EpsilonTestTime          float64
	RandomActionDistribution []float64

	NetInputs  int
	NumStates  int
	NumActions int
	WindowSize int

	StateWindow  [][]float64
	ActionWindow []int
	RewardWindow []float64
	NetWindow    [][]float64

	Rand       *rand.Rand
	ValueNet   convnet.Net
	TDTrainer  *convnet.Trainer
	Experience []Experience

	Age                 int
	ForwardPasses       int
	Epsilon             float64
	LatestReward        float64
	LastInputArray      []float64
	AverageRewardWindow *cnnutil.Window
	AverageLossWindow   *cnnutil.Window
	Learning            bool
}

A Brain object does all the magic. over time it receives some inputs and some rewards and its job is to set the outputs to maximize the expected reward

func NewBrain ¶

func NewBrain(numStates, numActions int, opt BrainOptions) (*Brain, error)

func (*Brain) Backward ¶

func (b *Brain) Backward(reward float64)

func (*Brain) Forward ¶

func (b *Brain) Forward(inputArray []float64) int

compute forward (behavior) pass given the input neuron signals from body

func (*Brain) NetInput ¶

func (b *Brain) NetInput(xt []float64) []float64

return s = (x,a,x,a,x,a,xt) state vector. It"s a concatenation of last window_size (x,a) pairs and current state x

func (*Brain) Policy ¶

func (b *Brain) Policy(s []float64) (action int, value float64)

compute the value of doing any action in this state and return the argmax action and its value

func (*Brain) RandomAction ¶

func (b *Brain) RandomAction() int

a bit of a helper function. It returns a random action we are abstracting this away because in future we may want to do more sophisticated things. For example some actions could be more or less likely at "rest"/default state.

func (*Brain) String ¶

func (b *Brain) String() string

type BrainOptions ¶

type BrainOptions struct {
	// in number of time steps, of temporal memory
	// the ACTUAL input to the net will be (x,a) temporal_window times, and followed by current x
	// so to have no information from previous time step going into value function, set to 0.
	TemporalWindow int
	// size of experience replay memory
	ExperienceSize int
	// number of examples in experience replay memory before we begin learning
	StartLearnThreshold int
	// gamma is a crucial parameter that controls how much plan-ahead the agent does. In [0,1]
	Gamma float64
	// number of steps we will learn for
	LearningStepsTotal int
	// how many steps of the above to perform only random actions (in the beginning)?
	LearningStepsBurnin int
	// what epsilon value do we bottom out on? 0.0 => purely deterministic policy at end
	EpsilonMin float64
	// what epsilon to use at test time? (i.e. when learning is disabled)
	EpsilonTestTime float64
	// advanced feature. Sometimes a random action should be biased towards some values
	// for example in flappy bird, we may want to choose to not flap more often
	// this better sum to 1 by the way, and be of length this.num_actions
	RandomActionDistribution []float64

	LayerDefs        []convnet.LayerDef
	HiddenLayerSizes []int
	Rand             *rand.Rand

	TDTrainerOptions convnet.TrainerOptions
}

type Experience ¶

type Experience struct {
	State0  []float64
	Action0 int
	Reward0 float64
	State1  []float64
}

An agent is in state0 and does action0 environment then assigns reward0 and provides new state, state1 Experience nodes store all this information, which is used in the Q-learning update step

Source Files ¶

View all Source files

deepqlearn.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL