bandit

package module
v0.0.0-...-4408542 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 10, 2014 License: BSD-3-Clause Imports: 16 Imported by: 0

README

Bandit

Build Status Coverage Status

A mulitarmed bandit to A/B test go projects, or other languages via an HTTP API. It uses a log-based data flow. Based on John Myles White's Bandit Algorithms for Website Optimization. Full documentation is available on godoc.

You can see a general introduction to [Multiarmed Bandits] 1 here.

Build bandit with make. You need go >= 1.1.1..

Data Flow

A bandit instance is embedded into e.g. an HTTP server. Incoming requests select a variation from an experiment, and then log that selection. Subsequent positive feedback from that selection, e.g. a click, is also logged.

Periodically, bandit-job aggregates selections and rewards from the logs, re- calculates variation distribution, and emits a snapshot file to some shared storage. The bandit polls for updates to that snapshot file and hot-reloads the distribution on change.

+----------+           +----------+                +------------+
| Bandit   |--select-->| Log      |- - periodic - >| bandit-job |
| instance |--reward-->| storage  |                |            |
+----------+           +----------+                +------------+
     ^                                                    |
     |                 +----------+                       |
     '---------poll----| Snapshot |<----------------------'
                       +----------+

bandit-job expects log lines in the following format:

1379257984 BanditSelection shape-20130822:1:8932478932
1379257987 BanditReward shape-20130822:1:8932478932 0.000000

Notice that the reward line includes the variation tag. It is up to you to transport this tag through your system.

Types

A Strategy is used to select arms and update arms with reward information:

type Strategy interface {
  SelectArm() int
  Update(arm int, reward float64)
}

You will probably not use bandits directly. Instead, a Strategy is put to work inside an Experiment. You set up experiments (e.g. signup form buttons) with as many variations as you like (e.g. blue button, red button):

  +--------+            +---------------+      periodic job
  | Strategy | 1 <----- 1 | Snapshot file | <--- aggregates logs
  +--------+            +---------------+      into counters
      1
      ^
      |
      1
+------------+         +---------+
| Experiment | 1 --> * | Variation |
|------------|         |---------|
| name       |         | tag     |
+------------+         | url     |
                       +---------+

Integrating and running experiments

To use a strategy, you first have to define an experiment and its variations. This is currently configured in json with a name, URL, and tag. See experiments.json for an example.

Choose the best integration for your project depending on whether you have a client side javascript application, a go project, or a project in some other language.

Integration with Javascript and the HTTP API

Run bandit-api -port 80 -apiExperiments experiments.json to start the endpoint with the provided test experiments.

In this scenario, the application makes a request to the API endpoint and then a second request to your API.

             Bandit HTTP API
Javascript   Select   Reward   Your API
----------   ------   ------   --------
     |-------->|                  |
     |<--------|                  |
     |                            |
     |--------------------------->|
     |<---------------------------|
     :
   later
     :
     |----------------->|
     |<-----------------|

Get a variation from the HTTP API first:

GET https://api/experiements/widgets?uid=11 HTTP/1.0

The API responds with a variation:

HTTP/1.0 200 OK
Content-Type: text/json

{
  uid: 11,
  experiment: "widgets",
  url: "https://api/widget?color=blue"
  tag: "widget-sauce-flf89"
}

The client can now follow up with a request to the returned widget:

GET https://api/widget?color=blue HTTP/1.0

See the exampe binary and example/index.html for a running example.

Integration in another language using the HTTP API

Launch the HTTP API as above. When you get a request to your endpoint, make a backend request to the HTTP API. Use the returned variation to vary.

Integration with Go projects

First, load an experiment.

experiments := bandit.NewFileOpener("experiments.json")
e, err := bandit.NewExperiment(experiments, "shape-20130822")
if err != nil {
  log.Fatalf("could not construct experiment: %s", err.Error())
}

fmt.Println(e.Variations)

Initialize your own variation code if necessary. Then, serve. In each request, select a variation via the experiment and serve it. Be sure to include the tag in the response, so your clients can pass it back with rewards.

Miscellaneous information

Aggregating Logs

In a production setting logs are aggregated as described in Data Flow. You can use bandit-job as a streaming map reduce job with bandit-job -kind map and bandit-job -kind reduce. You can also run over the logs wiht bandit-job -kind poll. See bandit-job -h for information.

Strategy Algorithms

You can currently choose between Epsilon Greedy, UCB1, Softmax, and Thompson (see, e.g., Chapelle & Li, 2011 ). See the godoc for detailed information.

Snapshots and delayed bandits

You can configure your strategy to get it's internal state from a snapshot like this:

[
  {
    "experiment_name": "shape-20130822",
    "strategy": "softmax",
    "parameters": [0.1],
    "snapshot": "snapshot.tsv",
    "snapshot-poll-seconds": 60,
    "variations": [
      {
        "url": "http://localhost:8080/widget?shape=circle",
        "description": "Everybody likes circles.",
        "ordinal": 1
      },
      {
        "url": "http://localhost:8080/widget?shape=square",
        "description": "Everybody likes squares.",
        "ordinal": 2
      }
    ]
  }
]

Simulation

The bandit/sim package includes the facility to simulate and plot experiemnts. You should run your own simulations before putting experiments into production. See the sim package for details. You can run bandit-plot to see some out of the box simulations.

Status

Version: 0.0.0-alpha.1

The API is currently not stable and is subject to change at any time.

TODO

  • UCB with extensions for delayed rewards
  • Sticky assignments
  • Extend Thompson sampling for different reward distributions

Credits

Developed by

  • Rany Keddo (@purzelrakete)
  • Ozgür Demir (@ozgurdemir)
  • Christoph Sawade

Thanks to for advice and opinions to

  • John Myles White
  • Josh Devins
  • Peter Bourgon
  • Sean Braithwaite

Documentation

Overview

Package bandit implements a multiarmed strategy. Runs A/B tests while controlling the tradeoff between exploring new arms and exploiting the currently best arm.

The project is based on John Myles White's 'Strategy Algorithms for Website Optimization'.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func RewardLine

func RewardLine(experiment Experiment, selected Variation, reward float64) string

RewardLine captures all selected arms. This log can be used in conjunction with reward logs to fully rebuild strategys.

func SelectionLine

func SelectionLine(experiment Experiment, selected Variation) string

SelectionLine captures all selected arms. This log can be used in conjunction with reward logs to fully rebuild strategys.

func TimestampedTagToTag

func TimestampedTagToTag(timestampedTag string) (string, int64, error)

TimestampedTagToTag docodes a timestamped tag in the form <tag>:<timestamp> into a (tag, ts)

Types

type Counters

type Counters struct {
	sync.Mutex
	// contains filtered or unexported fields
}

Counters maintain internal strategy state

func GetSnapshot

func GetSnapshot(o Opener) (Counters, error)

GetSnapshot returns Counters given a snapshot filename.

func NewCounters

func NewCounters(arms int) Counters

NewCounters constructs counters for given arms

func ParseSnapshot

func ParseSnapshot(s io.Reader) (Counters, error)

ParseSnapshot reads in a snapshot file. Snapshot files contain a single line experiment snapshot, for example:

2 0.1 0.5

Tokens are separated by whitespace. The given example encodes an experiment with two variations. First is the number of variations. This is followed by rewards (mean reward for each arm).

func (*Counters) Init

func (c *Counters) Init(snapshot *Counters) error

Init the strategy to a new counter state.

func (*Counters) Reset

func (c *Counters) Reset()

Reset the strategy to initial state.

func (*Counters) Update

func (c *Counters) Update(arm int, reward float64)

Update the running average, where arm is the 1 indexed arm

type Experiment

type Experiment struct {
	Name       string
	Strategy   Strategy
	Variations Variations
}

Experiment is a single experiment. Variations are in ascending ordinal sorting, where ordinals are contiguous and start at 1.

func NewExperiment

func NewExperiment(o Opener, name string) (*Experiment, error)

NewExperiment loads experiment `name` from the experiments source.

func (*Experiment) GetTaggedVariation

func (e *Experiment) GetTaggedVariation(tag string) (Variation, error)

GetTaggedVariation selects the appropriate variation given it's tag

func (*Experiment) GetVariation

func (e *Experiment) GetVariation(ordinal int) (Variation, error)

GetVariation selects the appropriate variation given it's 1 indexed ordinal

func (*Experiment) Select

func (e *Experiment) Select() Variation

Select calls SelectArm on the strategy and returns the associated variation

func (*Experiment) SelectTimestamped

func (e *Experiment) SelectTimestamped(
	timestampedTag string,
	ttl time.Duration) (Variation, string, error)

SelectTimestamped selects the appropriate variation given it's timestampedTag. A timestamped tag is a string in the form <tag>:<timestamp>. If the duration between <timestamp> and the current time is smaller than `d`, the given tagged is used to return variation. If it is larger, Select() is called instead. If the `timestampedTag` argument is the blank string, Select() is called instead.

type Experiments

type Experiments map[string]*Experiment

Experiments is an index of names to experiment

func NewExperiments

func NewExperiments(o Opener) (*Experiments, error)

NewExperiments reads in a json file and converts it to a map of experiments.

func (*Experiments) GetVariation

func (e *Experiments) GetVariation(tag string) (Experiment, Variation, error)

GetVariation returns the Experiment and variation pointed to by a string tag.

type Opener

type Opener interface {
	Open() (io.ReadCloser, error)
}

Opener can be used to reopen underlying file descriptors.

func NewFileOpener

func NewFileOpener(filename string) Opener

NewFileOpener returns an Opener using and underlying file.

func NewHTTPOpener

func NewHTTPOpener(url string) Opener

NewHTTPOpener returns an opener using an underlying URL.

func NewOpener

func NewOpener(ref string) Opener

NewOpener returns an http opener or a file opener depending on `ref`.

type Strategy

type Strategy interface {
	SelectArm() int
	Update(arm int, reward float64)
	Init(*Counters) error
	Reset()
}

Strategy can select arm or update information

func New

func New(arms int, name string, params []float64) (Strategy, error)

New returns an initialized stragtegy given a name like 'softmax'.

func NewDelayed

func NewDelayed(s Strategy, o Opener, poll time.Duration) (Strategy, error)

NewDelayed wraps a strategy and updates internal counters from a snapshot at `poll` interval.

func NewEpsilonGreedy

func NewEpsilonGreedy(arms int, epsilon float64) (Strategy, error)

NewEpsilonGreedy constructs an epsilon greedy strategy.

func NewSoftmax

func NewSoftmax(arms int, τ float64) (Strategy, error)

NewSoftmax constructs a softmax strategy. Softmax explores arms in proportion to their estimated values.

func NewThompson

func NewThompson(arms int, α float64) (Strategy, error)

NewThompson constructs a thompson sampling strategy.

func NewUCB1

func NewUCB1(arms int) Strategy

NewUCB1 returns a UCB1 Strategy

type Variation

type Variation struct {
	Ordinal     int    // 1 indexed arm ordinal
	URL         string // the url associated with this variation, for out of band
	Tag         string // this tag is used throughout the lifecycle of the experiment
	Description string // freitext
}

Variation describes endpoints which are mapped onto strategy arms.

type Variations

type Variations []Variation

Variations is a set of variations sorted by ordinal.

func (Variations) Len

func (v Variations) Len() int

func (Variations) Less

func (v Variations) Less(i, j int) bool

func (Variations) Swap

func (v Variations) Swap(i, j int)

Directories

Path Synopsis
Package http provides an HTTP API for strategy experiments.
Package http provides an HTTP API for strategy experiments.
Package main contains bandit-job, which takes as input a log of selects and rewards of the following format: 1379257984 BanditSelection shape-20130822:1:8932478932 1379257987 BanditReward shape-20130822:1:8932478932 0.000000 Fields are interpreted as follows: (logline-timestamp, kind, tag, reward) Tags are interpreted as: experiment-name:variation-ordinal:pinning-time
Package main contains bandit-job, which takes as input a log of selects and rewards of the following format: 1379257984 BanditSelection shape-20130822:1:8932478932 1379257987 BanditReward shape-20130822:1:8932478932 0.000000 Fields are interpreted as follows: (logline-timestamp, kind, tag, reward) Tags are interpreted as: experiment-name:variation-ordinal:pinning-time

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL