ppo

package

v0.0.0-...-225e849 Latest Latest Go to latest Published: Oct 22, 2020 License: Apache-2.0 Imports: 14 Imported by: 1

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/aunum/gold

Links

Open Source Insights

README ¶

Proximal Policy Optimization

In Progress ⚠️ blocked on https://github.com/gorgonia/gorgonia/issues/373

Implementation of the Proximal Policy Optimization algorithm.

How it works

PPO is an on-policy method that aims to solve the step size issue with policy gradients. Typically policy gradient algorithms are very sensitive to step size, too large a step and the agent can fall into an unrecoverable state, to small a size and the agent takes a very long time to train. PPO solves this issue by ensuring that an agents policy never deviates too far from the previous policy.

A ratio is taken of the old policy to the new policy and the delta is clipped to ensure policy changes remain within a bounds.

Examples

See the experiments folder for example implementations.

Roadmap

waiting on bug https://github.com/gorgonia/gorgonia/issues/373

References

Documentation ¶

Overview ¶

Package ppo is an agent implementation of the Proximal Policy Optimization algorithm.

Index ¶

Variables
func GAE(values, masks, rewards []*t.Dense, gamma, lambda float32) (returns, advantage *t.Dense, err error)
func MakeActor(config *ModelConfig, base *agentv1.Base, env *envv1.Env) (modelv1.Model, error)
func MakeCritic(config *ModelConfig, base *agentv1.Base, env *envv1.Env) (modelv1.Model, error)
func WithClip(val float64) func(*Loss)
func WithCriticDiscount(val float32) func(*Loss)
func WithEntropyBeta(val float32) func(*Loss)
type Agent
- func NewAgent(c *AgentConfig, env *envv1.Env) (*Agent, error)
- func (a *Agent) Action(state *tensor.Dense) (action int, event *Event, err error)
- func (a *Agent) Learn(event *Event) error
type AgentConfig
type BatchedEvents
type Event
- func NewEvent(state, actionProbs, actionOneHot, qValue *tensor.Dense) *Event
- func (e *Event) Apply(outcome *envv1.Outcome)
type Events
- func (e *Events) Batch() (events *BatchedEvents, err error)
type Hyperparameters
type LayerBuilder
type Loss
- func NewLoss(oldProbs, advantages, rewards, values *modelv1.Input, opts ...LossOpt) *Loss
- func (l *Loss) CloneTo(graph *g.ExprGraph, opts ...modelv1.CloneOpt) modelv1.Loss
- func (l *Loss) Compute(yHat, y *g.Node) (loss *g.Node, err error)
- func (l *Loss) Inputs() modelv1.Inputs
type LossOpt
type Memory
- func NewMemory() *Memory
- func (m *Memory) Len() int
- func (m *Memory) Pop() (e *Events)
- func (m *Memory) Remember(event *Event) error
- func (m *Memory) Reset()
type ModelConfig

Constants ¶

This section is empty.

Variables ¶

View Source

var DefaultActorConfig = &ModelConfig{
	Optimizer:    g.NewAdamSolver(),
	LayerBuilder: DefaultActorLayerBuilder,
	BatchSize:    20,
}

DefaultActorConfig are the default hyperparameters for a policy.

View Source

var DefaultActorLayerBuilder = func(env *envv1.Env) []layer.Config {
	return []layer.Config{
		layer.FC{Input: env.ObservationSpaceShape()[0], Output: 24},
		layer.FC{Input: 24, Output: 24},
		layer.FC{Input: 24, Output: envv1.PotentialsShape(env.ActionSpace)[0], Activation: layer.Softmax},
	}
}

DefaultActorLayerBuilder is a default fully connected layer builder.

View Source

var DefaultAgentConfig = &AgentConfig{
	Hyperparameters: DefaultHyperparameters,
	Base:            agentv1.NewBase("PPO"),
	ActorConfig:     DefaultActorConfig,
	CriticConfig:    DefaultCriticConfig,
}

DefaultAgentConfig is the default config for a dqn agent.

View Source

var DefaultCriticConfig = &ModelConfig{
	Loss:         modelv1.MSE,
	Optimizer:    g.NewAdamSolver(),
	LayerBuilder: DefaultCriticLayerBuilder,
	BatchSize:    20,
}

DefaultCriticConfig are the default hyperparameters for a policy.

View Source

var DefaultCriticLayerBuilder = func(env *envv1.Env) []layer.Config {
	return []layer.Config{
		layer.FC{Input: env.ObservationSpaceShape()[0], Output: 24},
		layer.FC{Input: 24, Output: 24},
		layer.FC{Input: 24, Output: 1, Activation: layer.Tanh},
	}
}

DefaultCriticLayerBuilder is a default fully connected layer builder.

View Source

var DefaultHyperparameters = &Hyperparameters{
	Gamma:  0.99,
	Lambda: 0.95,
}

DefaultHyperparameters are the default hyperparameters.

Functions ¶

func GAE ¶

func GAE(values, masks, rewards []*t.Dense, gamma, lambda float32) (returns, advantage *t.Dense, err error)

GAE is generalized advantage estimation.

func MakeActor ¶

func MakeActor(config *ModelConfig, base *agentv1.Base, env *envv1.Env) (modelv1.Model, error)

MakeActor makes the actor which chooses actions based on the policy.

func MakeCritic ¶

func MakeCritic(config *ModelConfig, base *agentv1.Base, env *envv1.Env) (modelv1.Model, error)

MakeCritic makes the critic which creats a qValue based on the outcome of the action taken.

func WithClip ¶

func WithClip(val float64) func(*Loss)

WithClip sets the clipping value. Defaults to 0.2

func WithCriticDiscount ¶

func WithCriticDiscount(val float32) func(*Loss)

WithCriticDiscount sets the critic discount. Defaults to 0.5

func WithEntropyBeta ¶

func WithEntropyBeta(val float32) func(*Loss)

WithEntropyBeta sets the entropy beta. Defaults to 0.001

Types ¶

type Agent ¶

type Agent struct {
	// Base for the agent.
	*agentv1.Base

	// Hyperparameters for the dqn agent.
	*Hyperparameters

	// Actor chooses actions.
	Actor modelv1.Model

	// Critic updates params.
	Critic modelv1.Model

	// Memory of the agent.
	Memory *Memory
	// contains filtered or unexported fields
}

Agent is a dqn agent.

func NewAgent ¶

func NewAgent(c *AgentConfig, env *envv1.Env) (*Agent, error)

NewAgent returns a new dqn agent.

func (*Agent) Action ¶

func (a *Agent) Action(state *tensor.Dense) (action int, event *Event, err error)

Action selects the best known action for the given state.

func (*Agent) Learn ¶

func (a *Agent) Learn(event *Event) error

Learn the agent.

type AgentConfig ¶

type AgentConfig struct {
	// Base for the agent.
	Base *agentv1.Base

	// Hyperparameters for the agent.
	*Hyperparameters

	// ActorConfig is the actor model config.
	ActorConfig *ModelConfig

	// CriticConfig is the critic model config.
	CriticConfig *ModelConfig
}

AgentConfig is the config for a dqn agent.

type BatchedEvents ¶

type BatchedEvents struct {
	States, ActionProbs, ActionOneHots, QValues, Masks, Rewards *tensor.Dense
	Len                                                         int
}

BatchedEvents are the events as a batched tensor.

type Event ¶

type Event struct {
	State, ActionProbs, ActionOneHot, QValue, Mask, Reward *tensor.Dense
}

Event is an event that occurred when interacting with an environment.

func NewEvent ¶

func NewEvent(state, actionProbs, actionOneHot, qValue *tensor.Dense) *Event

NewEvent returns a new event.

func (*Event) Apply ¶

func (e *Event) Apply(outcome *envv1.Outcome)

Apply an outcome to an event.

type Events ¶

type Events struct {
	States, ActionProbs, ActionOneHots, QValues, Masks, Rewards []*tensor.Dense
}

Events are the events as a batched tensor.

func (*Events) Batch ¶

func (e *Events) Batch() (events *BatchedEvents, err error)

Batch the events.

type Hyperparameters ¶

type Hyperparameters struct {
	// Gamma is the discount factor (0≤γ≤1). It determines how much importance we want to give to future
	// rewards. A high value for the discount factor (close to 1) captures the long-term effective award, whereas,
	// a discount factor of 0 makes our agent consider only immediate reward, hence making it greedy.
	Gamma float32

	// Lambda is the smoothing factor which is used to reduce variance and stablilize training.
	Lambda float32
}

Hyperparameters for the dqn agent.

type LayerBuilder ¶

type LayerBuilder func(env *envv1.Env) []layer.Config

LayerBuilder builds layers.

type Loss ¶

type Loss struct {
	// contains filtered or unexported fields
}

Loss is a custom loss for PPO. It is designed to ensure that policies are never over updated.

func NewLoss ¶

func NewLoss(oldProbs, advantages, rewards, values *modelv1.Input, opts ...LossOpt) *Loss

NewLoss returns a new PPO loss.

func (*Loss) CloneTo ¶

func (l *Loss) CloneTo(graph *g.ExprGraph, opts ...modelv1.CloneOpt) modelv1.Loss

CloneTo another graph.

func (*Loss) Compute ¶

func (l *Loss) Compute(yHat, y *g.Node) (loss *g.Node, err error)

Compute the loss.

func (*Loss) Inputs ¶

func (l *Loss) Inputs() modelv1.Inputs

Inputs returns any inputs the loss function utilizes.

type LossOpt ¶

type LossOpt func(*Loss)

LossOpt is an option for PPO loss.

type Memory ¶

type Memory struct {
	// contains filtered or unexported fields
}

Memory for the dqn agent.

func NewMemory ¶

func NewMemory() *Memory

NewMemory returns a new Memory store.

func (*Memory) Len ¶

func (m *Memory) Len() int

Len is the number of events in the memory.

func (*Memory) Pop ¶

func (m *Memory) Pop() (e *Events)

Pop the values out of the memory.

func (*Memory) Remember ¶

func (m *Memory) Remember(event *Event) error

Remember an event.

func (*Memory) Reset ¶

func (m *Memory) Reset()

Reset the memory.

type ModelConfig ¶

type ModelConfig struct {
	// Loss function to evaluate network performance.
	Loss modelv1.Loss

	// Optimizer to optimize the weights with regards to the error.
	Optimizer g.Solver

	// LayerBuilder is a builder of layer.
	LayerBuilder LayerBuilder

	// BatchSize of the updates.
	BatchSize int

	// Track is whether to track the model.
	Track bool
}

ModelConfig are the hyperparameters for a model.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
experiments
cartpole

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL