README

Bigmachine

Bigmachine is a toolkit for building self-managing serverless applications in Go. Bigmachine provides an API that lets a driver process form an ad-hoc cluster of machines to which user code is transparently distributed.

User code is exposed through services, which are stateful Go objects associated with each machine. Services expose one or more Go methods that may be dispatched remotely. User services can call remote user services; the driver process may also make service calls.

Programs built using Bigmachine are agnostic to the underlying machine implementation, allowing distributed systems to be easily tested through an in-process implementation, or inspected during development using local Unix processes.

Bigmachine currently supports instantiating clusters of EC2 machines; other systems may be implemented with a relatively compact Go interface.

Help wanted!

A walkthrough of a simple Bigmachine program

Command bigpi is a relatively silly use of cluster computing, but illustrative nonetheless. Bigpi estimates the value of $\pi$ by sampling $N$ random coordinates inside of the unit square, counting how many $C \le N$ fall inside of the unit circle. Our estimate is then $\pi = 4*C/N$.

This is inherently parallelizable: we can generate samples across a large number of nodes, and then when we're done, they can be summed up to produce our estimate of $\pi$.

To do this in Bigmachine, we first define a service that samples some $n$ points and reports how many fell inside the unit circle.

type circlePI struct{}

// Sample generates n points inside the unit square and reports
// how many of these fall inside the unit circle.
func (circlePI) Sample(ctx context.Context, n uint64, m *uint64) error {
	r := rand.New(rand.NewSource(rand.Int63()))
	for i := uint64(0); i < n; i++ {
		if i%1e7 == 0 {
			log.Printf("%d/%d", i, n)
		}
		x, y := r.Float64(), r.Float64()
		if (x-0.5)*(x-0.5)+(y-0.5)*(y-0.5) < 0.25 {
			*m++
		}
	}
	return nil
}

The only notable aspect of this code is the signature of Sample, which follows the schema below: methods that follow this convention may be dispatched remotely by Bigmachine, as we shall see soon.

func (service) Name(ctx context.Context, arg argtype, reply *replytype) error

Next follows the program's func main. First, we do the regular kind of setup a main might: define some flags, parse them, set up logging. Afterwards, a driver must call driver.Start, which initializes Bigmachine and sets up the process so that it may be bootstrapped properly on remote nodes. (Package driver provides high-level facilities for configuring and bootstrapping Bigmachine; adventurous users may use the lower-level facilitied in package bigmachine to accomplish the same.) driver.Start() returns a *bigmachine.B which can be used to start new machines.

func main() {
	var (
		nsamples = flag.Int("n", 1e10, "number of samples to make")
		nmachine = flag.Int("nmach", 5, "number of machines to provision for the task")
	)
	log.AddFlags()
	flag.Parse()
	b := driver.Start()
	defer b.Shutdown()

Next, we start a number of machines (as configured by flag nmach), wait for them to finish launching, and then distribute our sampling among them, using a simple "scatter-gather" RPC pattern. First, let's look at the code that starts the machines and waits for them to be ready.

// Start the desired number of machines,
// each with the circlePI service.
machines, err := b.Start(ctx, *nmachine, bigmachine.Services{
	"PI": circlePI{},
})
if err != nil {
	log.Fatal(err)
}
log.Print("waiting for machines to come online")
for _, m := range machines {
	<-m.Wait(bigmachine.Running)
	log.Printf("machine %s %s", m.Addr, m.State())
	if err := m.Err(); err != nil {
		log.Fatal(err)
	}
}
log.Print("all machines are ready")

Machines are started with (*B).Start, to which we provide the set of services that should be installed on each machine. (The service object is provided is serialized and initialized on the remote machine, so it may include any desired parameters.) Start returns with with a slice of Machine instances, representing each machine that was launched. Machines can be in a number of states. In this case, we keep it simple and just wait for them to enter their running states, after which the underlying machines are fully bootstrapped and the services have been installed and initialized. At this point, all of the machines are ready to receive RPC calls.

The remainder of main distributes a portion of the total samples to be taken to each machine, waits for them to complete, and then prints with the precision warranted by the number of samples taken. Note that this code further subdivides the work by calling PI.Sample once for each processor available on the underlying machines as defined by Machine.Maxprocs, which depends on the physical machine configuration.

// Number of samples per machine
numPerMachine := uint64(*nsamples) / uint64(*nmachine)

// Divide the total number of samples among all the processors on
// each machine. Aggregate the counts and then report the estimate.
var total uint64
var cores int
g, ctx := errgroup.WithContext(ctx)
for _, m := range machines {
	m := m
	for i := 0; i < m.Maxprocs; i++ {
		cores++
		g.Go(func() error {
			var count uint64
			err := m.Call(ctx, "PI.Sample", numPerMachine/uint64(m.Maxprocs), &count)
			if err == nil {
				atomic.AddUint64(&total, count)
			}
			return err
		})
	}
}
log.Printf("distributing work among %d cores", cores)
if err := g.Wait(); err != nil {
	log.Fatal(err)
}
log.Printf("total=%d nsamples=%d", total, *nsamples)
var (
	pi   = big.NewRat(int64(4*total), int64(*nsamples))
	prec = int(math.Log(float64(*nsamples)) / math.Log(10))
)
fmt.Printf("π = %s\n", pi.FloatString(prec))

We can now build and run our binary like an ordinary Go binary.

$ go build
$ ./bigpi
2019/10/01 16:31:20 waiting for machines to come online
2019/10/01 16:31:24 machine https://localhost:42409/ RUNNING
2019/10/01 16:31:24 machine https://localhost:44187/ RUNNING
2019/10/01 16:31:24 machine https://localhost:41618/ RUNNING
2019/10/01 16:31:24 machine https://localhost:41134/ RUNNING
2019/10/01 16:31:24 machine https://localhost:34078/ RUNNING
2019/10/01 16:31:24 all machines are ready
2019/10/01 16:31:24 distributing work among 5 cores
2019/10/01 16:32:05 total=7853881995 nsamples=10000000000
π = 3.1415527980

Here, Bigmachine distributed computation across logical machines, each corresponding to a single core on the host system. Each machine ran in its own Unix process (with its own address space), and RPC happened through mutually authenticated HTTP/2 connections.

Package driver provides some convenient flags that helps configure the Bigmachine runtime. Using these, we can configure Bigmachine to launch machines into EC2 instead:

$ ./bigpi -bigm.system=ec2
2019/10/01 16:38:10 waiting for machines to come online
2019/10/01 16:38:43 machine https://ec2-54-244-211-104.us-west-2.compute.amazonaws.com/ RUNNING
2019/10/01 16:38:43 machine https://ec2-54-189-82-173.us-west-2.compute.amazonaws.com/ RUNNING
2019/10/01 16:38:43 machine https://ec2-34-221-143-119.us-west-2.compute.amazonaws.com/ RUNNING
...
2019/10/01 16:38:43 all machines are ready
2019/10/01 16:38:43 distributing work among 5 cores
2019/10/01 16:40:19 total=7853881995 nsamples=10000000000
π = 3.1415527980

Once the program is running, we can use standard Go tooling to examine its behavior. For example, expvars are aggregated across all of the machines managed by Bigmachine, and the various profiles (CPU, memory, contention, etc.) are available as merged profiles through /debug/bigmachine/pprof. For example, in the first version of bigpi, the CPU profile highlighted a problem: we were using the global rand.Float64 which requires a lock; the resulting contention was easily identifiable through the CPU profile:

$ go tool pprof localhost:3333/debug/bigmachine/pprof/profile
Fetching profile over HTTP from http://localhost:3333/debug/bigmachine/pprof/profile
Saved profile in /Users/marius/pprof/pprof.045821636.samples.cpu.001.pb.gz
File: 045821636
Type: cpu
Time: Mar 16, 2018 at 3:17pm (PDT)
Duration: 2.51mins, Total samples = 16.80mins (669.32%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 779.47s, 77.31% of 1008.18s total
Dropped 51 nodes (cum <= 5.04s)
Showing top 10 nodes out of 58
      flat  flat%   sum%        cum   cum%
   333.11s 33.04% 33.04%    333.11s 33.04%  runtime.procyield
   116.71s 11.58% 44.62%    469.55s 46.57%  runtime.lock
    76.35s  7.57% 52.19%    347.21s 34.44%  sync.(*Mutex).Lock
    65.79s  6.53% 58.72%     65.79s  6.53%  runtime.futex
    41.48s  4.11% 62.83%    202.05s 20.04%  sync.(*Mutex).Unlock
    34.10s  3.38% 66.21%    364.36s 36.14%  runtime.findrunnable
       33s  3.27% 69.49%        33s  3.27%  runtime.cansemacquire
    32.72s  3.25% 72.73%     51.01s  5.06%  runtime.runqgrab
    24.88s  2.47% 75.20%     57.72s  5.73%  runtime.unlock
    21.33s  2.12% 77.31%     21.33s  2.12%  math/rand.(*rngSource).Uint64

And after the fix, it looks much healthier:

$ go tool pprof localhost:3333/debug/bigmachine/pprof/profile
...
      flat  flat%   sum%        cum   cum%
    29.09s 35.29% 35.29%     82.43s   100%  main.circlePI.Sample
    22.95s 27.84% 63.12%     52.16s 63.27%  math/rand.(*Rand).Float64
    16.09s 19.52% 82.64%     16.09s 19.52%  math/rand.(*rngSource).Uint64
     9.05s 10.98% 93.62%     25.14s 30.49%  math/rand.(*rngSource).Int63
     4.07s  4.94% 98.56%     29.21s 35.43%  math/rand.(*Rand).Int63
     1.17s  1.42%   100%      1.17s  1.42%  math/rand.New
         0     0%   100%     82.43s   100%  github.com/grailbio/bigmachine/rpc.(*Server).ServeHTTP
         0     0%   100%     82.43s   100%  github.com/grailbio/bigmachine/rpc.(*Server).ServeHTTP.func2
         0     0%   100%     82.43s   100%  golang.org/x/net/http2.(*serverConn).runHandler
         0     0%   100%     82.43s   100%  net/http.(*ServeMux).ServeHTTP

GOOS, GOARCH, and Bigmachine

When using Bigmachine's EC2 machine implementation, the process is bootstrapped onto remote EC2 instances. Currently, the only supported GOOS/GOARCH combination for these are linux/amd64. Because of this, the driver program must also be linux/amd64. However, Bigmachine also understands the fatbin format, so that users can compile fat binaries using the gofat tool. For example, the above can be run on a macOS driver if the binary is built using gofat instead of 'go':

macOS $ GO111MODULE=on go get github.com/grailbio/base/cmd/gofat
go: finding github.com/grailbio/base/cmd/gofat latest
go: finding github.com/grailbio/base/cmd latest
macOS $ gofat build
macOS $ ./bigpi -bigm.system=ec2
...

Documentation

Overview

Package bigmachine implements a vertically integrated stack for distributed computing in Go. Go programs written with bigmachine are transparently distributed across a number of machines as instantiated by the backend used. (Currently supported: EC2, local machines, unit tests.) Bigmachine clusters comprise a driver node and a number of bigmachine nodes (called "machines"). The driver node can create new machines and communicate with them; machines can call each other.

Computing model

On startup, a bigmachine program calls driver.Start. Driver.Start configures a bigmachine instance based on a set of standard flags and then starts it. (Users desiring a lower-level API can use bigmachine.Start directly.)

import (
	"github.com/grailbio/bigmachine"
	"github.com/grailbio/bigmachine/driver"
	...
)

func main() {
	flag.Parse()
	// Additional configuration and setup.
	b, shutdown := driver.Start()
	defer shutdown()

	// Driver code...
}

When the program is run, driver.Start returns immediately: the program can then interact with the returned bigmachine B to create new machines, define services on those machines, and invoke methods on those services. Bigmachine bootstraps machines by running the same binary, but in these runs, driver.Start never returns; instead it launches a server to handle calls from the driver program and other machines.

A machine is started by (*B).Start. Machines must be configured with at least one service:

m, err := b.Start(ctx, bigmachine.Services{
	"MyService": &MyService{Param1: value1, ...},
})

Users may then invoke methods on the services provided by the returned machine. A services's methods can be invoked so long as they are of the form:

Func(ctx context.Context, arg argType, reply *replyType) error

See package github.com/grailbio/bigmachine/rpc for more details.

Methods are named by the sevice and method name, separated by a dot ('.'), e.g.: "MyService.MyMethod":

if err := m.Call(ctx, "MyService.MyMethod", arg, &reply); err != nil {
	log.Print(err)
} else {
	// Examine reply
}

Since service instances must be serialized so that they can be transmitted to the remote machine, and because we do not know the service types a priori, any type that can appear as a service must be registered with gob. This is usually done in an init function in the package that declares type:

type MyService struct { ... }

func init() {
	// MyService has method receivers
	gob.Register(new(MyService))
}

Vertical computing

A bigmachine program attempts to appear and act like a single program:

- Each machine's standard output and error are copied to the driver;
- bigmachine provides aggregating profile handlers at /debug/bigmachine/pprof
  so that aggregate profiles may be taken over the full cluster;
- command line flags are propagated from the driver to the machine,
  so that a binary run can be configured in the usual way.

The driver program maintains keepalives to all of its machines. Once this is no longer maintained (e.g., because the driver finished, or crashed, or lost connectivity), the machines become idle and shut down.

Services

A service is any Go value that implements methods of the form given above. Services are instantiated by the user and registered with bigmachine. When a service is registered, bigmachine will also invoke an initialization method on the service if it exists. Per-machine initialization can be performed by this method.The form of the method is:

Init(*Service) error

If a non-nil error is returned, the machine is considered failed.

Index

Constants

View Source
const RpcPrefix = "/bigrpc/"

RpcPrefix is the path prefix used to serve RPC requests.

Variables

This section is empty.

Functions

func Init

func Init()

Init initializes bigmachine. It should be called after flag parsing and global setup in bigmachine-based processes. Init is a no-op if the binary is not running as a bigmachine worker; if it is, Init never returns.

func RegisterSystem

func RegisterSystem(name string, system System)

RegisterSystem is used by systems implementation to register a system implementation. RegisterSystem registers the implementation with gob, so that instances can be transmitted over the wire. It also registers the provided System instance as a default to use for the name to support bigmachine.Init.

Types

type B

type B struct {
	// contains filtered or unexported fields
}

B is a bigmachine instance. Bs are created by Start and, outside of testing situations, there is exactly one per process.

func Start

func Start(system System, opts ...Option) *B

Start is the main entry point of bigmachine. Start starts a new B using the provided system, returning the instance. B's shutdown method should be called to tear down the session, usually in a defer statement from the program's main:

func main() {
	// Parse flags, configure system.
	b := bigmachine.Start()
	defer b.Shutdown()

	// bigmachine driver code
}

func (*B) Dial

func (b *B) Dial(ctx context.Context, addr string) (*Machine, error)

Dial connects to the machine named by the provided address.

The returned machine is not owned: it is not kept alive as Start does.

func (*B) HandleDebug

func (b *B) HandleDebug(mux *http.ServeMux)

HandleDebug registers diagnostic http endpoints on the provided ServeMux.

func (*B) HandleDebugPrefix

func (b *B) HandleDebugPrefix(prefix string, mux *http.ServeMux)

HandleDebugPrefix registers diagnostic http endpoints on the provided ServeMux under the provided prefix.

func (*B) IsDriver

func (b *B) IsDriver() bool

IsDriver is true if this is a driver instance (rather than a spawned machine).

func (*B) Machines

func (b *B) Machines() []*Machine

Machines returns a snapshot of the current set machines known to this B.

func (*B) Shutdown

func (b *B) Shutdown()

Shutdown tears down resources associated with this B. It should be called by the driver to discard a session, usually in a defer:

b := bigmachine.Start()
defer b.Shutdown()
// driver code

func (*B) Start

func (b *B) Start(ctx context.Context, n int, params ...Param) ([]*Machine, error)

Start launches up to n new machines and returns them. The machines are configured according to the provided parameters. Each machine must have at least one service exported, or else Start returns an error. The new machines may be in Starting state when they are returned. Start maintains a keepalive to the returned machines, thus tying the machines' lifetime with the caller process.

Start returns at least one machine, or else an error.

func (*B) System

func (b *B) System() System

System returns this B's System implementation.

type DiskInfo

type DiskInfo struct {
	Usage disk.UsageStat
}

A DiskInfo describes system disk usage.

type Environ

type Environ []string

Environ is a machine parameter that amends the process environment of the machine. It is a slice of strings in the form "key=value"; later definitions override earlies ones.

type Expvar

type Expvar struct {
	Key   string
	Value string
}

An Expvar is a snapshot of an expvar.

type Expvars

type Expvars []Expvar

Expvars is a collection of snapshotted expvars.

func (Expvars) MarshalJSON

func (e Expvars) MarshalJSON() ([]byte, error)

type Info

type Info struct {
	// Goos and Goarch are the operating system and architectures
	// as reported by the Go runtime.
	Goos, Goarch string
	// Digest is the fingerprint of the currently running binary on the machine.
	Digest digest.Digest
}

Info contains system information about a machine.

func LocalInfo

func LocalInfo() Info

LocalInfo returns system information for this process.

type LoadInfo

type LoadInfo struct {
	Averages load.AvgStat
}

A LoadInfo describes system load.

type Machine

type Machine struct {
	// Addr is the address of the machine. It may be used to create
	// machine instances through Dial.
	Addr string

	// Maxprocs is the number of processors available on the machine.
	Maxprocs int

	// NoExec should be set to true if the machine should not exec a
	// new binary. This is meant for testing purposes.
	NoExec bool
	// contains filtered or unexported fields
}

A Machine is a single machine managed by bigmachine. Each machine is a "one-shot" execution of a bigmachine binary. Machines embody a failure detection mechanism, but does not provide fault tolerance. Each machine comprises instances of each registered bigmachine service. A Machine is created by the bigmachine driver binary, but its address can be passed to other Machines which can in turn connect to each other (through Dial).

Machines are created with (*B).Start.

func (*Machine) Call

func (m *Machine) Call(ctx context.Context, serviceMethod string, arg, reply interface{}) error

Call invokes a method named by a service on this machine. The argument and reply must be provided in accordance to bigmachine's RPC mechanism (see package docs or the docs of the rpc package). Call waits to invoke the method until the machine is in running state, and fails fast when it is stopped.

If a machine fails its keepalive, pending calls are canceled.

func (*Machine) Cancel

func (m *Machine) Cancel()

Cancel cancels all pending operations on machine m. The machine is stopped with an error of context.Canceled.

func (*Machine) DiskInfo

func (m *Machine) DiskInfo(ctx context.Context) (info DiskInfo, err error)

DiskInfo returns the machine's disk usage information.

func (*Machine) Err

func (m *Machine) Err() error

Err returns a machine's error. Err is only well-defined when the machine is in Stopped state.

func (*Machine) Hostname

func (m *Machine) Hostname() string

Hostname returns the hostname portion of the machine's address.

func (*Machine) KeepaliveReplyTimes

func (m *Machine) KeepaliveReplyTimes() []time.Duration

KeepaliveReplyTimes returns a buffer up to the last numKeepaliveReplyTimes keepalive reply latencies, most recent first.

func (*Machine) LoadInfo

func (m *Machine) LoadInfo(ctx context.Context) (info LoadInfo, err error)

LoadInfo returns the machine's current load.

func (*Machine) MemInfo

func (m *Machine) MemInfo(ctx context.Context, readMemStats bool) (info MemInfo, err error)

MemInfo returns the machine's memory usage information. Go runtime memory stats are read if readMemStats is true.

func (*Machine) NextKeepalive

func (m *Machine) NextKeepalive() time.Time

NextKeepalive returns the time at which the next keepalive request is due.

func (*Machine) Owned

func (m *Machine) Owned() bool

Owned tells whether this machine was created and is managed by this bigmachine instance.

func (*Machine) RetryCall

func (m *Machine) RetryCall(ctx context.Context, serviceMethod string, arg, reply interface{}) error

RetryCall invokes Call, and retries on a temporary error.

func (*Machine) State

func (m *Machine) State() State

State returns the machine's current state.

func (*Machine) Wait

func (m *Machine) Wait(state State) <-chan struct{}

Wait returns a channel that is closed once the machine reaches the provided state or greater.

type MemInfo

type MemInfo struct {
	System  mem.VirtualMemoryStat
	Runtime runtime.MemStats
}

A MemInfo describes system and Go runtime memory usage.

type Option

type Option func(b *B)

Option is an option that can be provided when starting a new B. It is a function that can modify the b that will be returned by Start.

func Name

func Name(name string) Option

Name is an option that will name the B. See B.name.

type Param

type Param interface {
	// contains filtered or unexported methods
}

A Param is a machine parameter. Parameters customize machines before the are started.

type Services

type Services map[string]interface{}

Services is a machine parameter that specifies the set of services that should be served by the machine. Each machine should have at least one service. Multiple Services parameters may be passed.

type State

type State int32

State enumerates the possible states of a machine. Machine states proceed monotonically: they can only increase in value.

const (
	// Unstarted indicates the machine has yet to be started.
	Unstarted State = iota
	// Starting indicates that the machine is currently bootstrapping.
	Starting
	// Running indicates that the machine is running and ready to
	// receive calls.
	Running
	// Stopped indicates that the machine was stopped, eitehr because of
	// a failure, or because the driver stopped it.
	Stopped
)

func (State) String

func (m State) String() string

String returns a State's string.

type Supervisor

type Supervisor struct {
	// contains filtered or unexported fields
}

Supervisor is the system service installed on every machine.

func StartSupervisor

func StartSupervisor(ctx context.Context, b *B, system System, server *rpc.Server) *Supervisor

StartSupervisor starts a new supervisor based on the provided arguments.

func (*Supervisor) CPUProfile

func (s *Supervisor) CPUProfile(ctx context.Context, dur time.Duration, prof *io.ReadCloser) error

CPUProfile takes a pprof CPU profile of this process for the provided duration. If a duration is not provided (is 0) a 30-second profile is taken. The profile is returned in the pprof serialized form (which uses protocol buffers underneath the hood).

func (*Supervisor) DiskInfo

func (s *Supervisor) DiskInfo(ctx context.Context, _ struct{}, info *DiskInfo) error

DiskInfo returns disk usage information on the disk where the temporary directory resides.

func (*Supervisor) Exec

func (s *Supervisor) Exec(ctx context.Context, _ struct{}, _ *struct{}) error

Exec reads a new image from its argument and replaces the current process with it. As a consequence, the currently running machine will die. It is up to the caller to manage this interaction.

func (*Supervisor) Expvars

func (s *Supervisor) Expvars(ctx context.Context, _ struct{}, vars *Expvars) error

Expvars returns a snapshot of this machine's expvars.

func (*Supervisor) GetBinary

func (s *Supervisor) GetBinary(ctx context.Context, _ struct{}, rc *io.ReadCloser) error

GetBinary retrieves the last binary uploaded via Setbinary.

func (*Supervisor) Getpid

func (s *Supervisor) Getpid(ctx context.Context, _ struct{}, pid *int) error

Getpid returns the PID of the supervisor process.

func (*Supervisor) Info

func (s *Supervisor) Info(ctx context.Context, _ struct{}, info *Info) error

Info returns the info struct for this machine.

func (*Supervisor) Keepalive

func (s *Supervisor) Keepalive(ctx context.Context, next time.Duration, reply *keepaliveReply) error

Keepalive maintains the machine keepalive. The next argument indicates the callers desired keepalive interval (i.e., the amount of time until the keepalive expires from the time of the call); the accepted time is returned. In order to maintain the keepalive, the driver should call Keepalive again before replynext expires.

func (*Supervisor) LoadInfo

func (s *Supervisor) LoadInfo(ctx context.Context, _ struct{}, info *LoadInfo) error

LoadInfo returns system load information.

func (*Supervisor) MemInfo

func (s *Supervisor) MemInfo(ctx context.Context, readMemStats bool, info *MemInfo) error

MemInfo returns system and Go runtime memory usage information. Go runtime stats are read if readMemStats is true.

func (*Supervisor) Ping

func (s *Supervisor) Ping(ctx context.Context, seq int, replyseq *int) error

Ping replies immediately with the sequence number provided.

func (*Supervisor) Profile

func (s *Supervisor) Profile(ctx context.Context, req profileRequest, prof *io.ReadCloser) error

Profile returns the named pprof profile for the current process. The profile is returned in protocol buffer format.

func (*Supervisor) Profiles

func (s *Supervisor) Profiles(ctx context.Context, _ struct{}, profiles *[]profileStat) error

Profiles returns the set of available profiles and their counts.

func (*Supervisor) Register

func (s *Supervisor) Register(ctx context.Context, svc service, _ *struct{}) error

Register registers a new service with the machine (server) associated with this supervisor. After registration, the service is also initialized if it implements the method

Init(*B) error

func (*Supervisor) Setargs

func (s *Supervisor) Setargs(ctx context.Context, args []string, _ *struct{}) error

Setargs sets the process' arguments. It should be used before Exec in order to invoke the new image with the appropriate arguments.

func (*Supervisor) Setbinary

func (s *Supervisor) Setbinary(ctx context.Context, binary io.Reader, _ *struct{}) error

Setbinary uploads a new binary to replace the current binary when Supervisor.Exec is called. The two calls are separated so that different timeouts can be applied to upload and exec.

func (*Supervisor) Setenv

func (s *Supervisor) Setenv(ctx context.Context, env []string, _ *struct{}) error

Setenv sets the processes' environment. It is applied to newly exec'd images, and should be called before Exec. The provided environment is appended to the default process environment: keys provided here override those that already exist in the environment.

func (*Supervisor) Shutdown

func (s *Supervisor) Shutdown(ctx context.Context, req shutdownRequest, _ *struct{}) error

Shutdown will cause the process to exit asynchronously at a point in the future no sooner than the specified delay.

type System

type System interface {
	// Name is the name of this system. It is used to multiplex multiple
	// system implementations, and thus should be unique among
	// systems.
	Name() string
	// Init is called when the bigmachine starts up in order to
	// initialize the system implementation. If an error is returned,
	// the Bigmachine fails.
	Init(*B) error
	// Main is called to start  a machine. The system is expected to
	// take over the process; the bigmachine fails if main returns (and
	// if it does, it should always return with an error).
	Main() error
	// Event logs an event of typ with (key, value) fields given in fieldPairs
	// as k0, v0, k1, v1, ...kn, vn. For example:
	//
	//  s.Event("bigmachine:machineStart", "addr", "https://m0")
	//
	// These semi-structured events are used for analytics.
	Event(typ string, fieldPairs ...interface{})
	// HTTPClient returns an HTTP client that can be used to communicate
	// from drivers to machines as well as between machines.
	HTTPClient() *http.Client
	// ListenAndServe serves the provided handler on an HTTP server that
	// is reachable from other instances in the bigmachine cluster. If addr
	// is the empty string, the default cluster address is used.
	ListenAndServe(addr string, handle http.Handler) error
	// Start launches up to n new machines.  The returned machines can
	// be in Unstarted state, but should eventually become available.
	Start(ctx context.Context, n int) ([]*Machine, error)
	// Exit is called to terminate a machine with the provided exit code.
	Exit(int)
	// Shutdown is called on graceful driver exit. It's should be used to
	// perform system tear down. It is not guaranteed to be called.
	Shutdown()
	// Maxprocs returns the maximum number of processors per machine,
	// as configured. Returns 0 if is a dynamic value.
	Maxprocs() int
	// KeepaliveConfig returns the various keepalive timeouts that should
	// be used to maintain keepalives for machines started by this system.
	KeepaliveConfig() (period, timeout, rpcTimeout time.Duration)
	// Tail returns a reader that follows the bigmachine process logs.
	Tail(ctx context.Context, m *Machine) (io.Reader, error)
	// Read returns a reader that reads the contents of the provided filename
	// on the host. This is done outside of the supervisor to support external
	// monitoring of the host.
	Read(ctx context.Context, m *Machine, filename string) (io.Reader, error)
}

A System implements a set of methods to set up a bigmachine and start new machines. Systems are also responsible for providing an HTTP client that can be used to communicate between machines and drivers.

var Local System = new(localSystem)

Local is a System that insantiates machines by creating new processes on the local machine.

Directories

Path Synopsis
cmd
bigoom
Command bigoom causes a bigmachine instance to OOM.
Command bigoom causes a bigmachine instance to OOM.
bigpi
Bigpi is an example bigmachine program that estimates digits of Pi using the Monte Carlo method.
Bigpi is an example bigmachine program that estimates digits of Pi using the Monte Carlo method.
ec2boot
Command ec2boot is a minimal bigmachine binary that is intended for bootstrapping binaries on EC2.
Command ec2boot is a minimal bigmachine binary that is intended for bootstrapping binaries on EC2.
Package driver provides a convenient API for bigmachine drivers, which includes configuration by flags.
Package driver provides a convenient API for bigmachine drivers, which includes configuration by flags.
Package ec2system implements a bigmachine System that launches machines on dedicated EC2 spot instances.
Package ec2system implements a bigmachine System that launches machines on dedicated EC2 spot instances.
regress
Package rpc implements a simple RPC system for Go methods.
Package rpc implements a simple RPC system for Go methods.
Package testsystem implements a bigmachine system that's useful for testing.
Package testsystem implements a bigmachine system that's useful for testing.
internal
authority
Package authority provides an in-process TLS certificate authority, useful for creating and distributing TLS certificates for mutually authenticated HTTPS networking within Bigmachine.
Package authority provides an in-process TLS certificate authority, useful for creating and distributing TLS certificates for mutually authenticated HTTPS networking within Bigmachine.
ioutil
Package ioutil contains utilities for performing I/O in bigmachine.
Package ioutil contains utilities for performing I/O in bigmachine.
tee
Package tee implements utilities for I/O multiplexing.
Package tee implements utilities for I/O multiplexing.