rl

package

v1.2.10 Latest Latest Go to latest Published: Mar 7, 2024 License: BSD-3-Clause Imports: 9 Imported by: 5

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/emer/leabra

Links

Open Source Insights

README ¶

Reinforcement Learning and Dopamine

The rl package provides core infrastructure for dopamine neuromodulation and reinforcement learning, including the Rescorla-Wagner learning algorithm (RW) and Temporal Differences (TD) learning, and a minimal ClampDaLayer that can be used to send an arbitrary DA signal.

da.go defines a simple DALayer interface for getting and setting dopamine values, and a SendDA list of layer names that has convenience methods, and ability to send dopamine to any layer that implements the DALayer interface.
The RW and TD DA layers use the CyclePost layer-level method to send the DA to other layers, at end of each cycle, after activation is updated. Thus, DA lags by 1 cycle, which typically should not be a problem.
See the separate pvlv package for the full biologically-based pvlv model on top of this basic DA infrastructure.

Documentation ¶

Overview ¶

Package rl provides core infrastructure for dopamine neuromodulation and reinforcement learning, including the Rescorla-Wagner learning algorithm (RW) and Temporal Differences (TD) learning, and a minimal `ClampDaLayer` that can be used to send an arbitrary DA signal.

`da.go` defines a simple `DALayer` interface for getting and setting dopamine values, and a `SendDA` list of layer names that has convenience methods, and ability to send dopamine to any layer that implements the DALayer interface.
The RW and TD DA layers use the `CyclePost` layer-level method to send the DA to other layers, at end of each cycle, after activation is updated. Thus, DA lags by 1 cycle, which typically should not be a problem.
See the separate `pvlv` package for the full biologically-based pvlv model on top of this basic DA infrastructure.

Index ¶

Variables
func AddRWLayers(nt *leabra.Network, prefix string, rel relpos.Relations, space float32) (rew, rp, da leabra.LeabraLayer)
func AddRWLayersPy(nt *leabra.Network, prefix string, rel relpos.Relations, space float32) []leabra.LeabraLayer
func AddTDLayers(nt *leabra.Network, prefix string, rel relpos.Relations, space float32) (rew, rp, ri, td leabra.LeabraLayer)
func AddTDLayersPy(nt *leabra.Network, prefix string, rel relpos.Relations, space float32) []leabra.LeabraLayer
type AChLayer
type ClampAChLayer
- func (ly *ClampAChLayer) Build() error
- func (ly *ClampAChLayer) CyclePost(ltime *leabra.Time)
- func (ly *ClampAChLayer) GetACh() float32
- func (ly *ClampAChLayer) SetACh(ach float32)
type ClampDaLayer
- func AddClampDaLayer(nt *leabra.Network, name string) *ClampDaLayer
- func (ly *ClampDaLayer) Build() error
- func (ly *ClampDaLayer) CyclePost(ltime *leabra.Time)
- func (ly *ClampDaLayer) Defaults()
- func (ly *ClampDaLayer) GetDA() float32
- func (ly *ClampDaLayer) SetDA(da float32)
type DALayer
type RWDaLayer
- func (ly *RWDaLayer) ActFmG(ltime *leabra.Time)
- func (ly *RWDaLayer) Build() error
- func (ly *RWDaLayer) CyclePost(ltime *leabra.Time)
- func (ly *RWDaLayer) Defaults()
- func (ly *RWDaLayer) GetDA() float32
- func (ly *RWDaLayer) RWLayers() (*leabra.Layer, *RWPredLayer, error)
- func (ly *RWDaLayer) SetDA(da float32)
type RWPredLayer
- func (ly *RWPredLayer) ActFmG(ltime *leabra.Time)
- func (ly *RWPredLayer) Defaults()
- func (ly *RWPredLayer) GetDA() float32
- func (ly *RWPredLayer) SetDA(da float32)
type RWPrjn
- func (pj *RWPrjn) DWt()
- func (pj *RWPrjn) Defaults()
- func (pj *RWPrjn) WtFmDWt()
type SendACh
- func (sd *SendACh) Add(laynm ...string)
- func (sd *SendACh) AddAllBut(net emer.Network, excl ...string)
- func (sd *SendACh) AddOne(laynm string)
- func (sd *SendACh) SendACh(net emer.Network, ach float32)
- func (sd *SendACh) Validate(net emer.Network, ctxt string) error
type SendDA
- func (sd *SendDA) Add(laynm ...string)
- func (sd *SendDA) AddAllBut(net emer.Network, excl ...string)
- func (sd *SendDA) AddOne(laynm string)
- func (sd *SendDA) SendDA(net emer.Network, da float32)
- func (sd *SendDA) Validate(net emer.Network, ctxt string) error
type TDDaLayer
- func (ly *TDDaLayer) ActFmG(ltime *leabra.Time)
- func (ly *TDDaLayer) Build() error
- func (ly *TDDaLayer) CyclePost(ltime *leabra.Time)
- func (ly *TDDaLayer) Defaults()
- func (ly *TDDaLayer) GetDA() float32
- func (ly *TDDaLayer) RewIntegLayer() (*TDRewIntegLayer, error)
- func (ly *TDDaLayer) SetDA(da float32)
type TDRewIntegLayer
- func (ly *TDRewIntegLayer) ActFmG(ltime *leabra.Time)
- func (ly *TDRewIntegLayer) Build() error
- func (ly *TDRewIntegLayer) Defaults()
- func (ly *TDRewIntegLayer) GetDA() float32
- func (ly *TDRewIntegLayer) RewPredLayer() (*TDRewPredLayer, error)
- func (ly *TDRewIntegLayer) SetDA(da float32)
type TDRewIntegParams
- func (tp *TDRewIntegParams) Defaults()
type TDRewPredLayer
- func (ly *TDRewPredLayer) ActFmG(ltime *leabra.Time)
- func (ly *TDRewPredLayer) GetDA() float32
- func (ly *TDRewPredLayer) SetDA(da float32)
type TDRewPredPrjn
- func (pj *TDRewPredPrjn) DWt()
- func (pj *TDRewPredPrjn) Defaults()
- func (pj *TDRewPredPrjn) WtFmDWt()

Constants ¶

This section is empty.

Variables ¶

View Source

var KiT_ClampAChLayer = kit.Types.AddType(&ClampAChLayer{}, leabra.LayerProps)

View Source

var KiT_ClampDaLayer = kit.Types.AddType(&ClampDaLayer{}, leabra.LayerProps)

View Source

var KiT_RWDaLayer = kit.Types.AddType(&RWDaLayer{}, leabra.LayerProps)

View Source

var KiT_RWPredLayer = kit.Types.AddType(&RWPredLayer{}, leabra.LayerProps)

View Source

var KiT_RWPrjn = kit.Types.AddType(&RWPrjn{}, deep.PrjnProps)

View Source

var KiT_TDDaLayer = kit.Types.AddType(&TDDaLayer{}, leabra.LayerProps)

View Source

var KiT_TDRewIntegLayer = kit.Types.AddType(&TDRewIntegLayer{}, leabra.LayerProps)

View Source

var KiT_TDRewPredLayer = kit.Types.AddType(&TDRewPredLayer{}, leabra.LayerProps)

View Source

var KiT_TDRewPredPrjn = kit.Types.AddType(&TDRewPredPrjn{}, leabra.PrjnProps)

Functions ¶

func AddRWLayers ¶

func AddRWLayers(nt *leabra.Network, prefix string, rel relpos.Relations, space float32) (rew, rp, da leabra.LeabraLayer)

AddRWLayers adds simple Rescorla-Wagner (PV only) dopamine system, with a primary Reward layer, a RWPred prediction layer, and a dopamine layer that computes diff. Only generates DA when Rew layer has external input -- otherwise zero.

func AddRWLayersPy ¶ added in v1.1.15

func AddRWLayersPy(nt *leabra.Network, prefix string, rel relpos.Relations, space float32) []leabra.LeabraLayer

AddRWLayersPy adds simple Rescorla-Wagner (PV only) dopamine system, with a primary Reward layer, a RWPred prediction layer, and a dopamine layer that computes diff. Only generates DA when Rew layer has external input -- otherwise zero. Py is Python version, returns layers as a slice

func AddTDLayers ¶

func AddTDLayers(nt *leabra.Network, prefix string, rel relpos.Relations, space float32) (rew, rp, ri, td leabra.LeabraLayer)

AddTDLayers adds the standard TD temporal differences layers, generating a DA signal. Projection from Rew to RewInteg is given class TDRewToInteg -- should have no learning and 1 weight.

func AddTDLayersPy ¶ added in v1.1.15

func AddTDLayersPy(nt *leabra.Network, prefix string, rel relpos.Relations, space float32) []leabra.LeabraLayer

AddTDLayersPy adds the standard TD temporal differences layers, generating a DA signal. Projection from Rew to RewInteg is given class TDRewToInteg -- should have no learning and 1 weight. Py is Python version, returns layers as a slice

Types ¶

type AChLayer ¶ added in v1.1.0

type AChLayer interface {
	// GetACh returns the acetylcholine level for layer
	GetACh() float32

	// SetACh sets the acetylcholine level for layer
	SetACh(ach float32)
}

AChLayer is an interface for a layer with acetylcholine neuromodulator on it

type ClampAChLayer ¶ added in v1.1.0

type ClampAChLayer struct {
	leabra.Layer

	// list of layers to send acetylcholine to
	SendACh SendACh `desc:"list of layers to send acetylcholine to"`

	// acetylcholine value for this layer
	ACh float32 `desc:"acetylcholine value for this layer"`
}

ClampAChLayer is an Input layer that just sends its activity as the acetylcholine signal

func (*ClampAChLayer) Build ¶ added in v1.1.0

func (ly *ClampAChLayer) Build() error

Build constructs the layer state, including calling Build on the projections.

func (*ClampAChLayer) CyclePost ¶ added in v1.1.0

func (ly *ClampAChLayer) CyclePost(ltime *leabra.Time)

CyclePost is called at end of Cycle We use it to send ACh, which will then be active for the next cycle of processing.

func (*ClampAChLayer) GetACh ¶ added in v1.1.0

func (ly *ClampAChLayer) GetACh() float32

func (*ClampAChLayer) SetACh ¶ added in v1.1.0

func (ly *ClampAChLayer) SetACh(ach float32)

type ClampDaLayer ¶

type ClampDaLayer struct {
	leabra.Layer

	// list of layers to send dopamine to
	SendDA SendDA `desc:"list of layers to send dopamine to"`

	// dopamine value for this layer
	DA float32 `desc:"dopamine value for this layer"`
}

ClampDaLayer is an Input layer that just sends its activity as the dopamine signal

func AddClampDaLayer ¶

func AddClampDaLayer(nt *leabra.Network, name string) *ClampDaLayer

AddClampDaLayer adds a ClampDaLayer of given name

func (*ClampDaLayer) Build ¶

func (ly *ClampDaLayer) Build() error

Build constructs the layer state, including calling Build on the projections.

func (*ClampDaLayer) CyclePost ¶

func (ly *ClampDaLayer) CyclePost(ltime *leabra.Time)

CyclePost is called at end of Cycle We use it to send DA, which will then be active for the next cycle of processing.

func (*ClampDaLayer) Defaults ¶ added in v1.1.2

func (ly *ClampDaLayer) Defaults()

func (*ClampDaLayer) GetDA ¶

func (ly *ClampDaLayer) GetDA() float32

func (*ClampDaLayer) SetDA ¶

func (ly *ClampDaLayer) SetDA(da float32)

type DALayer ¶

type DALayer interface {
	// GetDA returns the dopamine level for layer
	GetDA() float32

	// SetDA sets the dopamine level for layer
	SetDA(da float32)
}

DALayer is an interface for a layer with dopamine neuromodulator on it

type RWDaLayer ¶

type RWDaLayer struct {
	leabra.Layer

	// list of layers to send dopamine to
	SendDA SendDA `desc:"list of layers to send dopamine to"`

	// name of Reward-representing layer from which this computes DA -- if nothing clamped, no dopamine computed
	RewLay string `desc:"name of Reward-representing layer from which this computes DA -- if nothing clamped, no dopamine computed"`

	// name of RWPredLayer layer that is subtracted from the reward value
	RWPredLay string `desc:"name of RWPredLayer layer that is subtracted from the reward value"`

	// dopamine value for this layer
	DA float32 `inactive:"+" desc:"dopamine value for this layer"`
}

RWDaLayer computes a dopamine (DA) signal based on a simple Rescorla-Wagner learning dynamic (i.e., PV learning in the PVLV framework). It computes difference between r(t) and RWPred values. r(t) is accessed directly from a Rew layer -- if no external input then no DA is computed -- critical for effective use of RW only for PV cases. RWPred prediction is also accessed directly from Rew layer to avoid any issues.

func (*RWDaLayer) ActFmG ¶

func (ly *RWDaLayer) ActFmG(ltime *leabra.Time)

func (*RWDaLayer) Build ¶

func (ly *RWDaLayer) Build() error

Build constructs the layer state, including calling Build on the projections.

func (*RWDaLayer) CyclePost ¶

func (ly *RWDaLayer) CyclePost(ltime *leabra.Time)

CyclePost is called at end of Cycle We use it to send DA, which will then be active for the next cycle of processing.

func (*RWDaLayer) Defaults ¶

func (ly *RWDaLayer) Defaults()

func (*RWDaLayer) GetDA ¶

func (ly *RWDaLayer) GetDA() float32

func (*RWDaLayer) RWLayers ¶

func (ly *RWDaLayer) RWLayers() (*leabra.Layer, *RWPredLayer, error)

RWLayers returns the reward and RWPred layers based on names

func (*RWDaLayer) SetDA ¶

func (ly *RWDaLayer) SetDA(da float32)

type RWPredLayer ¶

type RWPredLayer struct {
	leabra.Layer

	// default 0.1..0.99 range of predictions that can be represented -- having a truncated range preserves some sensitivity in dopamine at the extremes of good or poor performance
	PredRange minmax.F32 `` /* 180-byte string literal not displayed */

	// dopamine value for this layer
	DA float32 `inactive:"+" desc:"dopamine value for this layer"`
}

RWPredLayer computes reward prediction for a simple Rescorla-Wagner learning dynamic (i.e., PV learning in the PVLV framework). Activity is computed as linear function of excitatory conductance (which can be negative -- there are no constraints). Use with RWPrjn which does simple delta-rule learning on minus-plus.

func (*RWPredLayer) ActFmG ¶

func (ly *RWPredLayer) ActFmG(ltime *leabra.Time)

ActFmG computes linear activation for RWPred

func (*RWPredLayer) Defaults ¶

func (ly *RWPredLayer) Defaults()

func (*RWPredLayer) GetDA ¶

func (ly *RWPredLayer) GetDA() float32

func (*RWPredLayer) SetDA ¶

func (ly *RWPredLayer) SetDA(da float32)

type RWPrjn ¶

type RWPrjn struct {
	leabra.Prjn

	// tolerance on DA -- if below this abs value, then DA goes to zero and there is no learning -- prevents prediction from exactly learning to cancel out reward value, retaining a residual valence of signal
	DaTol float32 `` /* 208-byte string literal not displayed */
}

RWPrjn does dopamine-modulated learning for reward prediction: Da * Send.Act Use in RWPredLayer typically to generate reward predictions. Has no weight bounds or limits on sign etc.

func (*RWPrjn) DWt ¶

func (pj *RWPrjn) DWt()

DWt computes the weight change (learning) -- on sending projections.

func (*RWPrjn) Defaults ¶

func (pj *RWPrjn) Defaults()

func (*RWPrjn) WtFmDWt ¶

func (pj *RWPrjn) WtFmDWt()

WtFmDWt updates the synaptic weight values from delta-weight changes -- on sending projections

type SendACh ¶ added in v1.1.0

type SendACh emer.LayNames

SendACh is a list of layers to send acetylcholine to

func (*SendACh) Add ¶ added in v1.1.0

func (sd *SendACh) Add(laynm ...string)

Add adds given layer name(s) to list

func (*SendACh) AddAllBut ¶ added in v1.1.0

func (sd *SendACh) AddAllBut(net emer.Network, excl ...string)

AddAllBut adds all layers in network except those in exlude list

func (*SendACh) AddOne ¶ added in v1.1.15

func (sd *SendACh) AddOne(laynm string)

AddOne adds one layer name to list -- python version -- doesn't support varargs

func (*SendACh) SendACh ¶ added in v1.1.0

func (sd *SendACh) SendACh(net emer.Network, ach float32)

SendACh sends acetylcholine to list of layers

func (*SendACh) Validate ¶ added in v1.1.0

func (sd *SendACh) Validate(net emer.Network, ctxt string) error

Validate ensures that LayNames layers are valid. ctxt is string for error message to provide context.

type SendDA ¶

type SendDA emer.LayNames

SendDA is a list of layers to send dopamine to

func (*SendDA) Add ¶

func (sd *SendDA) Add(laynm ...string)

Add adds given layer name(s) to list

func (*SendDA) AddAllBut ¶

func (sd *SendDA) AddAllBut(net emer.Network, excl ...string)

AddAllBut adds all layers in network except those in exlude list

func (*SendDA) AddOne ¶ added in v1.1.15

func (sd *SendDA) AddOne(laynm string)

AddOne adds one layer name to list -- python version -- doesn't support varargs

func (*SendDA) SendDA ¶

func (sd *SendDA) SendDA(net emer.Network, da float32)

SendDA sends dopamine to list of layers

func (*SendDA) Validate ¶

func (sd *SendDA) Validate(net emer.Network, ctxt string) error

Validate ensures that LayNames layers are valid. ctxt is string for error message to provide context.

type TDDaLayer ¶

type TDDaLayer struct {
	leabra.Layer

	// list of layers to send dopamine to
	SendDA SendDA `desc:"list of layers to send dopamine to"`

	// name of TDRewIntegLayer from which this computes the temporal derivative
	RewInteg string `desc:"name of TDRewIntegLayer from which this computes the temporal derivative"`

	// dopamine value for this layer
	DA float32 `desc:"dopamine value for this layer"`
}

TDDaLayer computes a dopamine (DA) signal as the temporal difference (TD) between the TDRewIntegLayer activations in the minus and plus phase.

func (*TDDaLayer) ActFmG ¶

func (ly *TDDaLayer) ActFmG(ltime *leabra.Time)

func (*TDDaLayer) Build ¶

func (ly *TDDaLayer) Build() error

Build constructs the layer state, including calling Build on the projections.

func (*TDDaLayer) CyclePost ¶

func (ly *TDDaLayer) CyclePost(ltime *leabra.Time)

CyclePost is called at end of Cycle We use it to send DA, which will then be active for the next cycle of processing.

func (*TDDaLayer) Defaults ¶

func (ly *TDDaLayer) Defaults()

func (*TDDaLayer) GetDA ¶

func (ly *TDDaLayer) GetDA() float32

func (*TDDaLayer) RewIntegLayer ¶

func (ly *TDDaLayer) RewIntegLayer() (*TDRewIntegLayer, error)

func (*TDDaLayer) SetDA ¶

func (ly *TDDaLayer) SetDA(da float32)

type TDRewIntegLayer ¶

type TDRewIntegLayer struct {
	leabra.Layer

	// parameters for reward integration
	RewInteg TDRewIntegParams `desc:"parameters for reward integration"`

	// dopamine value for this layer
	DA float32 `desc:"dopamine value for this layer"`
}

TDRewIntegLayer is the temporal differences reward integration layer. It represents estimated value V(t) in the minus phase, and estimated V(t+1) + r(t) in the plus phase. It computes r(t) from (typically fixed) weights from a reward layer, and directly accesses values from RewPred layer.

func (*TDRewIntegLayer) ActFmG ¶

func (ly *TDRewIntegLayer) ActFmG(ltime *leabra.Time)

func (*TDRewIntegLayer) Build ¶

func (ly *TDRewIntegLayer) Build() error

Build constructs the layer state, including calling Build on the projections.

func (*TDRewIntegLayer) Defaults ¶

func (ly *TDRewIntegLayer) Defaults()

func (*TDRewIntegLayer) GetDA ¶

func (ly *TDRewIntegLayer) GetDA() float32

func (*TDRewIntegLayer) RewPredLayer ¶

func (ly *TDRewIntegLayer) RewPredLayer() (*TDRewPredLayer, error)

func (*TDRewIntegLayer) SetDA ¶

func (ly *TDRewIntegLayer) SetDA(da float32)

type TDRewIntegParams ¶

type TDRewIntegParams struct {

	// discount factor -- how much to discount the future prediction from RewPred
	Discount float32 `desc:"discount factor -- how much to discount the future prediction from RewPred"`

	// name of TDRewPredLayer to get reward prediction from
	RewPred string `desc:"name of TDRewPredLayer to get reward prediction from "`
}

TDRewIntegParams are params for reward integrator layer

func (*TDRewIntegParams) Defaults ¶

func (tp *TDRewIntegParams) Defaults()

type TDRewPredLayer ¶

type TDRewPredLayer struct {
	leabra.Layer

	// dopamine value for this layer
	DA float32 `inactive:"+" desc:"dopamine value for this layer"`
}

TDRewPredLayer is the temporal differences reward prediction layer. It represents estimated value V(t) in the minus phase, and computes estimated V(t+1) based on its learned weights in plus phase. Use TDRewPredPrjn for DA modulated learning.

func (*TDRewPredLayer) ActFmG ¶

func (ly *TDRewPredLayer) ActFmG(ltime *leabra.Time)

ActFmG computes linear activation for TDRewPred

func (*TDRewPredLayer) GetDA ¶

func (ly *TDRewPredLayer) GetDA() float32

func (*TDRewPredLayer) SetDA ¶

func (ly *TDRewPredLayer) SetDA(da float32)

type TDRewPredPrjn ¶

type TDRewPredPrjn struct {
	leabra.Prjn
}

TDRewPredPrjn does dopamine-modulated learning for reward prediction: DWt = Da * Send.ActQ0 (activity on *previous* timestep) Use in TDRewPredLayer typically to generate reward predictions. Has no weight bounds or limits on sign etc.

func (*TDRewPredPrjn) DWt ¶

func (pj *TDRewPredPrjn) DWt()

DWt computes the weight change (learning) -- on sending projections.

func (*TDRewPredPrjn) Defaults ¶

func (pj *TDRewPredPrjn) Defaults()

func (*TDRewPredPrjn) WtFmDWt ¶

func (pj *TDRewPredPrjn) WtFmDWt()

WtFmDWt updates the synaptic weight values from delta-weight changes -- on sending projections

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL