dqn

package module

v1.0.0 Latest Latest Go to latest Published: Jul 10, 2024 License: BSD-3-Clause Imports: 3 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/iampaapa/dqn

Links

Open Source Insights

README ¶

Deep Q-Network (DQN) Module for Go

This module provides a flexible and efficient implementation of the Deep Q-Network (DQN) algorithm in Go. It's designed for industrial applications, offering robust reinforcement learning capabilities for complex decision-making tasks.

Features

Efficient Deep Q-Network implementation in pure Go
Flexible neural network architecture with customizable activation functions
Experience replay buffer for improved learning stability
Epsilon-greedy exploration strategy
Easy integration with custom environments
Comparison utilities for classical reinforcement learning methods

Installation

To use this module in your Go project, you can install it using go get:

go get github.com/iampaapa/dqn

Make sure you have Go installed (version 1.11+ for module support).

Usage

Here's a basic example of how to use the DQN module:

import (
    "github.com/iampaapa/dqn"
)

func main() {
    // Initialize a new DQN agent
    agent := dqn.NewDQN(
        inputSize,
        hiddenSize,
        outputSize,
        bufferSize,
        gamma,
        epsilon,
        learningRate,
        dqn.ReLU,
    )

    // Training loop
    for episode := 0; episode < numEpisodes; episode++ {
        state := environment.Reset()
        for !done {
            action := agent.EpsilonGreedyPolicy(state, numActions)
            nextState, reward, done := environment.Step(action)
            agent.Train(state, nextState, action, reward, done)
            state = nextState
        }
    }

    // Use the trained agent
    action := agent.EpsilonGreedyPolicy(state, numActions)
}

Example: Manufacturing Process Optimization

We've included a comprehensive example of using this DQN module for manufacturing process optimization. This example demonstrates how to:

Create a simulated manufacturing environment
Implement and train a DQN agent
Compare DQN performance with classical Q-learning
Visualize the results

You can find this example in the examples/manufacturing_optimization directory.

To run the example:

cd examples/manufacturing_optimization
go run main.go

This will train both a DQN agent and a Q-learning agent, compare their performance, and generate a performance graph.

Module Structure

The module consists of the following main components:

qnetwork.go: Implements the neural network for Q-value approximation
replaybuffer.go: Provides an experience replay buffer for improved learning stability
train.go: Contains the core DQN algorithm and training loop
utils.go: Offers utility functions for data normalization and other helper tasks

Contributing

Contributions to this module are welcome! Please follow these steps to contribute:

Fork the repository
Create a new branch for your feature or bug fix
Commit your changes
Push to your fork and submit a pull request

Please make sure to update tests as appropriate and adhere to the existing coding style.

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

For more information or if you encounter any issues, please open an issue on the GitHub repository.

Documentation ¶

Overview ¶

qnetwork.go

replaybuffer.go

train.go

utils.go

Index ¶

func Argmax(arr []float64) int
func Max(arr []float64) float64
func Normalize(state []float64) []float64
func ReLU(x float64) float64
func Sigmoid(x float64) float64
func Tanh(x float64) float64
type Activation
type DQN
- func NewDQN(inputSize, hiddenSize, outputSize, bufferSize int, ...) *DQN
- func (d *DQN) EpsilonGreedyPolicy(state []float64, numActions int) int
- func (d *DQN) Train(state, nextState []float64, action, reward int, done bool)
type Experience
type QNetwork
- func NewQNetwork(inputSize, hiddenSize, outputSize int, activation Activation) *QNetwork
type ReplayBuffer
- func NewReplayBuffer(size int) *ReplayBuffer
- func (rb *ReplayBuffer) Add(exp Experience)
- func (rb *ReplayBuffer) Sample(batchSize int) []Experience

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Argmax ¶

func Argmax(arr []float64) int

Argmax returns the index of the maximum value in a slice of float64

func Max ¶

func Max(arr []float64) float64

Max returns the maximum value in a slice of float64

func Normalize ¶

func Normalize(state []float64) []float64

Normalize normalizes a state vector.

func ReLU ¶

func ReLU(x float64) float64

func Sigmoid ¶

func Sigmoid(x float64) float64

func Tanh ¶

func Tanh(x float64) float64

Types ¶

type Activation ¶

type Activation func(float64) float64

Activation represents an activation function

type DQN ¶

type DQN struct {
	// contains filtered or unexported fields
}

DQN represents the Deep Q-Learning algorithm.

func NewDQN ¶

func NewDQN(inputSize, hiddenSize, outputSize, bufferSize int, gamma, epsilon, learningRate float64, activation Activation) *DQN

NewDQN initializes a new DQN instance.

func (*DQN) EpsilonGreedyPolicy ¶

func (d *DQN) EpsilonGreedyPolicy(state []float64, numActions int) int

EpsilonGreedyPolicy selects an action using epsilon-greedy strategy.

func (*DQN) Train ¶

func (d *DQN) Train(state, nextState []float64, action, reward int, done bool)

Train trains the Q-network.

type Experience ¶

type Experience struct {
	State, NextState []float64
	Action, Reward   int
	Done             bool
}

Experience represents a single experience tuple.

type QNetwork ¶

type QNetwork struct {
	// contains filtered or unexported fields
}

QNetwork represents a simple neural network for Q-value approximation.

func NewQNetwork ¶

func NewQNetwork(inputSize, hiddenSize, outputSize int, activation Activation) *QNetwork

NewQNetwork initializes a new QNetwork with random weights.

func (*QNetwork) Backward ¶

func (q *QNetwork) Backward(state, prediction, target []float64, learningRate float64)

Backward computes gradients and updates the network weights.

func (*QNetwork) Loss ¶

func (q *QNetwork) Loss(predictions, targets []float64) float64

Loss computes the mean squared error loss.

func (*QNetwork) Predict ¶

func (q *QNetwork) Predict(state []float64) []float64

Predict returns Q-values for a given state.

type ReplayBuffer ¶

type ReplayBuffer struct {
	// contains filtered or unexported fields
}

ReplayBuffer stores experiences for training.

func NewReplayBuffer ¶

func NewReplayBuffer(size int) *ReplayBuffer

NewReplayBuffer initializes a new ReplayBuffer.

func (*ReplayBuffer) Add ¶

func (rb *ReplayBuffer) Add(exp Experience)

Add adds a new experience to the buffer.

func (*ReplayBuffer) Sample ¶

func (rb *ReplayBuffer) Sample(batchSize int) []Experience

Sample returns a batch of experiences.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
examples
manufacturing_optimization command

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL