her

package
v0.0.0-...-225e849 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 22, 2020 License: Apache-2.0 Imports: 13 Imported by: 1

README

Hindsight Experience Replay

Hindsight experience replay allows an agent to learn in environments with sparse rewards and multiple goals.

How it works

HER utilizes UVFAs and works by augmenting experience replays with additional goals. The intuition being there is valuable information to be learned even when the end goal is not reached e.g. if I miss a shot in basketball I can still reason that had the hoop been slightly moved I would have made it.

HER is a sort of intrisic ciriculum learning in which the agent is able to learn from smaller goals before reaching the larger ones.

Examples

See the experiments folder for example implementations.

Roadmap

  • n>15 on bitflip
  • More hindsight types
  • More environments (push-drag)

References

Documentation

Overview

Package her is an agent implementation of the Hindsight Experience Replay algorithm.

Index

Constants

This section is empty.

Variables

View Source
var DefaultAgentConfig = &AgentConfig{
	Hyperparameters:  DefaultHyperparameters,
	PolicyConfig:     DefaultPolicyConfig,
	Base:             agentv1.NewBase("HER"),
	SuccessfulReward: 0,
	MemorySize:       1e4,
}

DefaultAgentConfig is the default config for a dqn+her agent.

View Source
var DefaultFCLayerBuilder = func(x, y *modelv1.Input) []layer.Config {
	return []layer.Config{
		layer.FC{Input: x.Squeeze()[0], Output: 512},
		layer.FC{Input: 512, Output: 512},
		layer.FC{Input: 512, Output: y.Squeeze()[0], Activation: layer.Linear},
	}
}

DefaultFCLayerBuilder is a default fully connected layer builder.

View Source
var DefaultHyperparameters = &Hyperparameters{
	Epsilon:              common.DefaultDecaySchedule(),
	Gamma:                0.9,
	UpdateTargetEpisodes: 50,
}

DefaultHyperparameters are the default hyperparameters.

View Source
var DefaultPolicyConfig = &PolicyConfig{
	Loss:         modelv1.MSE,
	Optimizer:    g.NewAdamSolver(g.WithBatchSize(128), g.WithLearnRate(0.0005)),
	LayerBuilder: DefaultFCLayerBuilder,
	BatchSize:    128,
	Track:        true,
}

DefaultPolicyConfig are the default hyperparameters for a policy.

Functions

func MakePolicy

func MakePolicy(name string, config *PolicyConfig, base *agentv1.Base, env *envv1.Env) (modelv1.Model, error)

MakePolicy makes a model.

Types

type Agent

type Agent struct {
	// Base for the agent.
	*agentv1.Base

	// Hyperparameters for the dqn+her agent.
	*Hyperparameters

	Policy       model.Model
	TargetPolicy model.Model
	Epsilon      common.Schedule
	// contains filtered or unexported fields
}

Agent is a dqn+her agent.

func NewAgent

func NewAgent(c *AgentConfig, env *envv1.Env) (*Agent, error)

NewAgent returns a new dqn+her agent.

func (*Agent) Action

func (a *Agent) Action(state, goal *tensor.Dense) (action int, err error)

Action selects the best known action for the given state.

func (*Agent) Hindsight

func (a *Agent) Hindsight(episodeEvents Events) error

Hindsight applies hindsight to the memory.

func (*Agent) Learn

func (a *Agent) Learn() error

Learn the agent.

func (*Agent) Remember

func (a *Agent) Remember(event ...*Event)

Remember events.

type AgentConfig

type AgentConfig struct {
	// Base for the agent.
	Base *agentv1.Base

	// Hyperparameters for the agent.
	*Hyperparameters

	// PolicyConfig for the agent.
	PolicyConfig *PolicyConfig

	// SuccessfulReward is the reward for reaching the goal.
	SuccessfulReward float32

	// MemorySize is the size of the memory.
	MemorySize int
}

AgentConfig is the config for a dqn+her agent.

type Event

type Event struct {
	*envv1.Outcome

	// State by which the action was taken.
	State *tensor.Dense

	// Goal the agent is trying to reach.
	Goal *tensor.Dense
	// contains filtered or unexported fields
}

Event is an event that occurred.

func NewEvent

func NewEvent(state, goal *tensor.Dense, outcome *envv1.Outcome) *Event

NewEvent returns a new event

func (*Event) Copy

func (e *Event) Copy() *Event

Copy the event.

func (*Event) Print

func (e *Event) Print()

Print the event.

type Events

type Events []*Event

Events that occurred.

func (Events) Copy

func (e Events) Copy() Events

Copy the events.

type Hyperparameters

type Hyperparameters struct {
	// Gamma is the discount factor (0≤γ≤1). It determines how much importance we want to give to future
	// rewards. A high value for the discount factor (close to 1) captures the long-term effective award, whereas,
	// a discount factor of 0 makes our agent consider only immediate reward, hence making it greedy.
	Gamma float32

	// Epsilon is the rate at which the agent should exploit vs explore.
	Epsilon common.Schedule

	// UpdateTargetEpisodes determines how often the target network updates its parameters.
	UpdateTargetEpisodes int
}

Hyperparameters for the dqn+her agent.

type LayerBuilder

type LayerBuilder func(x, y *modelv1.Input) []layer.Config

LayerBuilder builds layers.

type Memory

type Memory struct {
	// contains filtered or unexported fields
}

Memory for the dqn agent.

func NewMemory

func NewMemory(size int) *Memory

NewMemory returns a new Memory store.

func (*Memory) Len

func (m *Memory) Len() int

Len of the memory.

func (*Memory) Remember

func (m *Memory) Remember(events ...*Event)

Remember events.

func (*Memory) Sample

func (m *Memory) Sample(batchsize int) (ret []*Event, err error)

Sample a batch size from memory.

type PolicyConfig

type PolicyConfig struct {
	// Loss function to evaluate network performance.
	Loss modelv1.Loss

	// Optimizer to optimize the weights with regards to the error.
	Optimizer g.Solver

	// LayerBuilder is a builder of layer.
	LayerBuilder LayerBuilder

	// Batch size to train on.
	BatchSize int

	// Track is whether to track the model.
	Track bool
}

PolicyConfig are the hyperparameters for a policy.

Directories

Path Synopsis
experiments

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL