rl

package

v1.10.0 Latest Latest Go to latest Published: Mar 21, 2026 License: Apache-2.0 Imports: 3 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zerfoo/zerfoo

Links

Open Source Insights

Documentation ¶

Overview ¶

Experimental — this package is not yet wired into the main framework.

Package rl provides reinforcement learning interfaces and utilities. (Stability: alpha)

Index ¶

type Action
type Agent
type Environment
type Experience
type PPO
- func NewPPO(cfg PPOConfig) *PPO
- func (p *PPO) Act(state State) Action
- func (p *PPO) Learn(batch []Experience) error
type PPOConfig
- func DefaultPPOConfig(stateDim, actionDim int) PPOConfig
type ReplayBuffer
- func NewReplayBuffer(capacity int) *ReplayBuffer
type SAC
- func NewSAC(cfg SACConfig) *SAC
type SACConfig
type State

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Action ¶

type Action = []float64

Action represents a decision made by an agent.

type Environment ¶

type Environment interface {
	// Reset initialises the environment and returns the starting state.
	Reset() State
	// Step advances the environment by one time step.
	// It returns the next state, the scalar reward, a done flag, and any error.
	Step(action Action) (next State, reward float64, done bool, err error)
}

Environment defines the RL environment contract.

type Experience ¶

type Experience struct {
	State     State
	Action    Action
	Reward    float64
	NextState State
	Done      bool
}

Experience holds a single transition tuple for replay.

type PPO ¶

type PPO struct {
	// contains filtered or unexported fields
}

PPO implements the Proximal Policy Optimization agent with clipped surrogate objective and Generalized Advantage Estimation (GAE).

func NewPPO ¶

func NewPPO(cfg PPOConfig) *PPO

NewPPO creates a PPO agent with the given configuration.

func (*PPO) Act ¶

func (p *PPO) Act(state State) Action

Act selects an action by sampling from the Gaussian policy.

func (*PPO) Learn ¶

func (p *PPO) Learn(batch []Experience) error

Learn performs PPO updates on the given batch of sequential experiences.

type PPOConfig ¶

type PPOConfig struct {
	StateDim     int
	ActionDim    int
	HiddenDim    int
	ClipRatio    float64
	Gamma        float64
	Lambda       float64
	NEpochs      int
	BatchSize    int
	LearningRate float64
}

PPOConfig holds hyperparameters for the PPO agent.

func DefaultPPOConfig ¶

func DefaultPPOConfig(stateDim, actionDim int) PPOConfig

DefaultPPOConfig returns a PPOConfig with sensible defaults.

type ReplayBuffer ¶

type ReplayBuffer struct {
	// contains filtered or unexported fields
}

ReplayBuffer stores experience tuples for off-policy learning. When the buffer is full the oldest entry is overwritten (FIFO eviction).

func NewReplayBuffer ¶

func NewReplayBuffer(capacity int) *ReplayBuffer

NewReplayBuffer returns a ReplayBuffer with the given capacity. capacity must be > 0.

func (*ReplayBuffer) Add ¶

func (rb *ReplayBuffer) Add(exp Experience)

Add appends an experience, overwriting the oldest entry when full.

func (*ReplayBuffer) Len ¶

func (rb *ReplayBuffer) Len() int

Len returns the number of experiences currently stored.

func (*ReplayBuffer) Sample ¶

func (rb *ReplayBuffer) Sample(batchSize int) []Experience

Sample returns batchSize experiences chosen uniformly at random (with replacement).

func (*ReplayBuffer) SamplePrioritized ¶

func (rb *ReplayBuffer) SamplePrioritized(batchSize int, priorities []float64) []Experience

SamplePrioritized returns batchSize experiences sampled proportionally to the provided priorities slice (one weight per stored experience, index 0 = oldest). priorities must have length equal to rb.Len(); any negative value is treated as 0.

type SAC ¶

type SAC struct {
	// contains filtered or unexported fields
}

SAC implements the Soft Actor-Critic algorithm with twin Q-networks and automatic entropy temperature tuning.

func NewSAC ¶

func NewSAC(cfg SACConfig) *SAC

NewSAC creates a new SAC agent with the given configuration.

func (*SAC) Act ¶

func (s *SAC) Act(state State) Action

Act selects an action for the given state using the current policy.

func (*SAC) Alpha ¶

func (s *SAC) Alpha() float64

Alpha returns the current entropy temperature.

func (*SAC) Learn ¶

func (s *SAC) Learn(batch []Experience) error

Learn updates actor, twin critics, and entropy temperature from a batch of experiences.

type SACConfig ¶

type SACConfig struct {
	Gamma         float64 // Discount factor (default 0.99).
	Tau           float64 // Soft update coefficient for target networks (default 0.005).
	LearningRate  float64 // Learning rate for actor and critic networks.
	AlphaLR       float64 // Learning rate for the entropy temperature parameter.
	StateDim      int     // Dimensionality of the state space.
	ActionDim     int     // Dimensionality of the action space.
	HiddenDim     int     // Width of hidden layers in actor and critic networks.
	BatchSize     int     // Mini-batch size for learning.
	InitAlpha     float64 // Initial entropy temperature.
	TargetEntropy float64 // Target entropy for automatic alpha tuning (typically -ActionDim).
}

SACConfig holds hyperparameters for the SAC agent.

type State ¶

type State = []float64

State represents an observation from the environment.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Documentation ¶

Overview ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

Types ¶

type Action ¶

type Agent ¶

type Environment ¶

type Experience ¶

type PPO ¶

func NewPPO ¶

func (*PPO) Act ¶

func (*PPO) Learn ¶

type PPOConfig ¶

func DefaultPPOConfig ¶

type ReplayBuffer ¶

func NewReplayBuffer ¶

func (*ReplayBuffer) Add ¶

func (*ReplayBuffer) Len ¶

func (*ReplayBuffer) Sample ¶

func (*ReplayBuffer) SamplePrioritized ¶

type SAC ¶

func NewSAC ¶

func (*SAC) Act ¶

func (*SAC) Alpha ¶

func (*SAC) Learn ¶

type SACConfig ¶

type State ¶

Source Files ¶