Documentation
¶
Overview ¶
Experimental — this package is not yet wired into the main framework.
Package rl provides reinforcement learning interfaces and utilities. (Stability: alpha)
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Agent ¶
type Agent interface {
// Act selects an action given the current state.
Act(state State) Action
// Learn updates the agent's parameters from a batch of experiences.
Learn(batch []Experience) error
}
Agent defines the RL agent contract.
type Environment ¶
type Environment interface {
// Reset initialises the environment and returns the starting state.
Reset() State
// Step advances the environment by one time step.
// It returns the next state, the scalar reward, a done flag, and any error.
Step(action Action) (next State, reward float64, done bool, err error)
}
Environment defines the RL environment contract.
type Experience ¶
Experience holds a single transition tuple for replay.
type PPO ¶
type PPO struct {
// contains filtered or unexported fields
}
PPO implements the Proximal Policy Optimization agent with clipped surrogate objective and Generalized Advantage Estimation (GAE).
func (*PPO) Learn ¶
func (p *PPO) Learn(batch []Experience) error
Learn performs PPO updates on the given batch of sequential experiences.
type PPOConfig ¶
type PPOConfig struct {
StateDim int
ActionDim int
HiddenDim int
ClipRatio float64
Gamma float64
Lambda float64
NEpochs int
BatchSize int
LearningRate float64
}
PPOConfig holds hyperparameters for the PPO agent.
func DefaultPPOConfig ¶
DefaultPPOConfig returns a PPOConfig with sensible defaults.
type ReplayBuffer ¶
type ReplayBuffer struct {
// contains filtered or unexported fields
}
ReplayBuffer stores experience tuples for off-policy learning. When the buffer is full the oldest entry is overwritten (FIFO eviction).
func NewReplayBuffer ¶
func NewReplayBuffer(capacity int) *ReplayBuffer
NewReplayBuffer returns a ReplayBuffer with the given capacity. capacity must be > 0.
func (*ReplayBuffer) Add ¶
func (rb *ReplayBuffer) Add(exp Experience)
Add appends an experience, overwriting the oldest entry when full.
func (*ReplayBuffer) Len ¶
func (rb *ReplayBuffer) Len() int
Len returns the number of experiences currently stored.
func (*ReplayBuffer) Sample ¶
func (rb *ReplayBuffer) Sample(batchSize int) []Experience
Sample returns batchSize experiences chosen uniformly at random (with replacement).
func (*ReplayBuffer) SamplePrioritized ¶
func (rb *ReplayBuffer) SamplePrioritized(batchSize int, priorities []float64) []Experience
SamplePrioritized returns batchSize experiences sampled proportionally to the provided priorities slice (one weight per stored experience, index 0 = oldest). priorities must have length equal to rb.Len(); any negative value is treated as 0.
type SAC ¶
type SAC struct {
// contains filtered or unexported fields
}
SAC implements the Soft Actor-Critic algorithm with twin Q-networks and automatic entropy temperature tuning.
func (*SAC) Learn ¶
func (s *SAC) Learn(batch []Experience) error
Learn updates actor, twin critics, and entropy temperature from a batch of experiences.
type SACConfig ¶
type SACConfig struct {
Gamma float64 // Discount factor (default 0.99).
Tau float64 // Soft update coefficient for target networks (default 0.005).
LearningRate float64 // Learning rate for actor and critic networks.
AlphaLR float64 // Learning rate for the entropy temperature parameter.
StateDim int // Dimensionality of the state space.
ActionDim int // Dimensionality of the action space.
HiddenDim int // Width of hidden layers in actor and critic networks.
BatchSize int // Mini-batch size for learning.
InitAlpha float64 // Initial entropy temperature.
TargetEntropy float64 // Target entropy for automatic alpha tuning (typically -ActionDim).
}
SACConfig holds hyperparameters for the SAC agent.