deepqlearn

package
v1.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 6, 2021 License: MIT Imports: 6 Imported by: 1

Documentation

Index

Constants

This section is empty.

Variables

View Source
var DefaultBrainOptions = BrainOptions{
	TemporalWindow:           1,
	ExperienceSize:           30000,
	StartLearnThreshold:      int(math.Floor(math.Min(30000*0.1, 1000))),
	Gamma:                    0.8,
	LearningStepsTotal:       100000,
	LearningStepsBurnin:      3000,
	EpsilonMin:               0.05,
	EpsilonTestTime:          0.01,
	RandomActionDistribution: nil,
	TDTrainerOptions: convnet.TrainerOptions{
		LearningRate: 0.01,
		Momentum:     0.0,
		BatchSize:    64,
		L2Decay:      0.01,
	},
}

Functions

This section is empty.

Types

type Brain

type Brain struct {
	TemporalWindow           int
	ExperienceSize           int
	StartLearnThreshold      int
	Gamma                    float64
	LearningStepsTotal       int
	LearningStepsBurnin      int
	EpsilonMin               float64
	EpsilonTestTime          float64
	RandomActionDistribution []float64

	NetInputs  int
	NumStates  int
	NumActions int
	WindowSize int

	StateWindow  [][]float64
	ActionWindow []int
	RewardWindow []float64
	NetWindow    [][]float64

	Rand       *rand.Rand
	ValueNet   convnet.Net
	TDTrainer  *convnet.Trainer
	Experience []Experience

	Age                 int
	ForwardPasses       int
	Epsilon             float64
	LatestReward        float64
	LastInputArray      []float64
	AverageRewardWindow *cnnutil.Window
	AverageLossWindow   *cnnutil.Window
	Learning            bool
}

A Brain object does all the magic. over time it receives some inputs and some rewards and its job is to set the outputs to maximize the expected reward

func NewBrain

func NewBrain(numStates, numActions int, opt BrainOptions) (*Brain, error)

func (*Brain) Backward

func (b *Brain) Backward(reward float64)

func (*Brain) Forward

func (b *Brain) Forward(inputArray []float64) int

compute forward (behavior) pass given the input neuron signals from body

func (*Brain) NetInput

func (b *Brain) NetInput(xt []float64) []float64

return s = (x,a,x,a,x,a,xt) state vector. It"s a concatenation of last window_size (x,a) pairs and current state x

func (*Brain) Policy

func (b *Brain) Policy(s []float64) (action int, value float64)

compute the value of doing any action in this state and return the argmax action and its value

func (*Brain) RandomAction

func (b *Brain) RandomAction() int

a bit of a helper function. It returns a random action we are abstracting this away because in future we may want to do more sophisticated things. For example some actions could be more or less likely at "rest"/default state.

func (*Brain) String

func (b *Brain) String() string

type BrainOptions

type BrainOptions struct {
	// in number of time steps, of temporal memory
	// the ACTUAL input to the net will be (x,a) temporal_window times, and followed by current x
	// so to have no information from previous time step going into value function, set to 0.
	TemporalWindow int
	// size of experience replay memory
	ExperienceSize int
	// number of examples in experience replay memory before we begin learning
	StartLearnThreshold int
	// gamma is a crucial parameter that controls how much plan-ahead the agent does. In [0,1]
	Gamma float64
	// number of steps we will learn for
	LearningStepsTotal int
	// how many steps of the above to perform only random actions (in the beginning)?
	LearningStepsBurnin int
	// what epsilon value do we bottom out on? 0.0 => purely deterministic policy at end
	EpsilonMin float64
	// what epsilon to use at test time? (i.e. when learning is disabled)
	EpsilonTestTime float64
	// advanced feature. Sometimes a random action should be biased towards some values
	// for example in flappy bird, we may want to choose to not flap more often
	// this better sum to 1 by the way, and be of length this.num_actions
	RandomActionDistribution []float64

	LayerDefs        []convnet.LayerDef
	HiddenLayerSizes []int
	Rand             *rand.Rand

	TDTrainerOptions convnet.TrainerOptions
}

type Experience

type Experience struct {
	State0  []float64
	Action0 int
	Reward0 float64
	State1  []float64
}

An agent is in state0 and does action0 environment then assigns reward0 and provides new state, state1 Experience nodes store all this information, which is used in the Q-learning update step

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL