rl

package
v1.38.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 30, 2026 License: Apache-2.0 Imports: 4 Imported by: 0

Documentation

Overview

Experimental — this package is not yet wired into the main framework.

Package rl provides reinforcement learning interfaces and utilities. (Stability: alpha)

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Action

type Action = []float64

Action represents a decision made by an agent.

type Agent

type Agent interface {
	// Act selects an action given the current state.
	Act(state State) Action
	// Learn updates the agent's parameters from a batch of experiences.
	Learn(batch []Experience) error
}

Agent defines the RL agent contract.

type Environment

type Environment interface {
	// Reset initialises the environment and returns the starting state.
	Reset() State
	// Step advances the environment by one time step.
	// It returns the next state, the scalar reward, a done flag, and any error.
	Step(action Action) (next State, reward float64, done bool, err error)
}

Environment defines the RL environment contract.

type Experience

type Experience struct {
	State     State
	Action    Action
	Reward    float64
	NextState State
	Done      bool
}

Experience holds a single transition tuple for replay.

type PPO

type PPO struct {
	// contains filtered or unexported fields
}

PPO implements the Proximal Policy Optimization agent with clipped surrogate objective and Generalized Advantage Estimation (GAE).

func NewPPO

func NewPPO(cfg PPOConfig) *PPO

NewPPO creates a PPO agent with the given configuration.

func (*PPO) Act

func (p *PPO) Act(state State) Action

Act selects an action by sampling from the Gaussian policy.

func (*PPO) Learn

func (p *PPO) Learn(batch []Experience) error

Learn performs PPO updates on the given batch of sequential experiences.

type PPOConfig

type PPOConfig struct {
	StateDim     int
	ActionDim    int
	HiddenDim    int
	ClipRatio    float64
	Gamma        float64
	Lambda       float64
	NEpochs      int
	BatchSize    int
	LearningRate float64
}

PPOConfig holds hyperparameters for the PPO agent.

func DefaultPPOConfig

func DefaultPPOConfig(stateDim, actionDim int) PPOConfig

DefaultPPOConfig returns a PPOConfig with sensible defaults.

type ReplayBuffer

type ReplayBuffer struct {
	// contains filtered or unexported fields
}

ReplayBuffer stores experience tuples for off-policy learning. When the buffer is full the oldest entry is overwritten (FIFO eviction).

func NewReplayBuffer

func NewReplayBuffer(capacity int) (*ReplayBuffer, error)

NewReplayBuffer returns a ReplayBuffer with the given capacity. capacity must be > 0; otherwise a non-nil error is returned.

func (*ReplayBuffer) Add

func (rb *ReplayBuffer) Add(exp Experience)

Add appends an experience, overwriting the oldest entry when full.

func (*ReplayBuffer) Len

func (rb *ReplayBuffer) Len() int

Len returns the number of experiences currently stored.

func (*ReplayBuffer) Sample

func (rb *ReplayBuffer) Sample(batchSize int) []Experience

Sample returns batchSize experiences chosen uniformly at random (with replacement).

func (*ReplayBuffer) SamplePrioritized

func (rb *ReplayBuffer) SamplePrioritized(batchSize int, priorities []float64) ([]Experience, error)

SamplePrioritized returns batchSize experiences sampled proportionally to the provided priorities slice (one weight per stored experience, index 0 = oldest). priorities must have length equal to rb.Len(); any negative value is treated as 0.

type SAC

type SAC struct {
	// contains filtered or unexported fields
}

SAC implements the Soft Actor-Critic algorithm with twin Q-networks and automatic entropy temperature tuning.

func NewSAC

func NewSAC(cfg SACConfig) *SAC

NewSAC creates a new SAC agent with the given configuration.

func (*SAC) Act

func (s *SAC) Act(state State) Action

Act selects an action for the given state using the current policy.

func (*SAC) Alpha

func (s *SAC) Alpha() float64

Alpha returns the current entropy temperature.

func (*SAC) Learn

func (s *SAC) Learn(batch []Experience) error

Learn updates actor, twin critics, and entropy temperature from a batch of experiences.

type SACConfig

type SACConfig struct {
	Gamma         float64 // Discount factor (default 0.99).
	Tau           float64 // Soft update coefficient for target networks (default 0.005).
	LearningRate  float64 // Learning rate for actor and critic networks.
	AlphaLR       float64 // Learning rate for the entropy temperature parameter.
	StateDim      int     // Dimensionality of the state space.
	ActionDim     int     // Dimensionality of the action space.
	HiddenDim     int     // Width of hidden layers in actor and critic networks.
	BatchSize     int     // Mini-batch size for learning.
	InitAlpha     float64 // Initial entropy temperature.
	TargetEntropy float64 // Target entropy for automatic alpha tuning (typically -ActionDim).
}

SACConfig holds hyperparameters for the SAC agent.

type State

type State = []float64

State represents an observation from the environment.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL