ec2cluster

package
v0.0.0-...-90deddd Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 18, 2023 License: Apache-2.0 Imports: 53 Imported by: 2

Documentation

Overview

Package ec2cluster implements support for maintaining elastic clusters of Reflow instances on EC2.

The EC2 instances created launch reflowlet agent processes that are given the user's profile token so that they can set up HTTPS servers that can perform mutual authentication to the reflow driver process and other reflowlets (for transferring objects) and also access external services like caching.

The VM instances are configured to terminate if they are idle on EC2's billing hour boundary. They also terminate on any fatal reflowlet error.

Index

Constants

View Source
const ExpVarCluster = "ec2cluster"

ExpVarCluster is the expvar endpoint for ec2cluster information.

View Source
const ReflowletCloudwatchFlushMs = 5000

Variables

This section is empty.

Functions

func GetAMI

func GetAMI(sess *session.Session) (string, error)

GetAMI gets the AMI ID for the AWS region derived from the given AWS session.

func GetSpotPlacementScores

func GetSpotPlacementScores(ctx context.Context, api ec2iface.EC2API, region, instanceType string) (map[string]int, error)

GetSpotPlacementScores returns spot placement scores for the given instance type in the given region. GetSpotPlacementScores returns a map of each Availability Zone name (within the given region) to the score. Note that the region is stripped from the AZ names (for eg: if region is "us-west-2", then AZ name "us-west-2a" is trimmed to "a").

func InstanceType

func InstanceType(need reflow.Resources, spot bool, maxPrice float64) (string, reflow.Resources)

InstanceType returns the instance type (and the amount of resources it provides) which is most appropriate for the needed resources. `spot` determines whether we should consider instance types that are available as spot instances or not.

func NewSpotProber

func NewSpotProber(capacityFunc capacityFunc, maxProbeDepth int, ttl time.Duration) *spotProber

NewSpotProber returns a spot prober which uses the given capacityFunc to determine capacity, the given maxProbeDepth as the depth to start probing at and uses the ttl as expiration of previously cached probing result.

func OnDemandPrice

func OnDemandPrice(typ, region string) (hourlyPriceUsd float64)

OnDemandPrice returns the on-demand hourly price of the given instance type in the given region.

Types

type CloudFile

type CloudFile struct {
	Path        string `yaml:"path,omitempty"`
	Permissions string `yaml:"permissions,omitempty"`
	Owner       string `yaml:"owner,omitempty"`
	Content     string `yaml:"content,omitempty"`
	Encoding    string `yaml:"encoding,omitempty"`
}

CloudFile is a component of the cloudConfig configuration for CoreOS. It represents a file that will be written to the filesystem.

type CloudUnit

type CloudUnit struct {
	Name    string `yaml:"name,omitempty"`
	Command string `yaml:"command,omitempty"`
	Enable  bool   `yaml:"enable,omitempty"`
	Content string `yaml:"content,omitempty"`
}

CloudUnit is a component of the cloudConfig configuration for CoreOS. It represents a CoreOS unit.

type Cluster

type Cluster struct {
	pool.Mux `yaml:"-"`
	// HTTPClient is used to communicate to the reflowlet servers
	// running on the individual instances. In Cluster, this is done for
	// liveness/health checking.
	HTTPClient *http.Client `yaml:"-"`
	// Logger for cluster events.
	Log *log.Logger `yaml:"-"`
	// EC2 is the EC2 API instance through which EC2 calls are made.
	EC2 ec2iface.EC2API `yaml:"-"`
	// Authenticator authenticates the ECR repository that stores the
	// Reflowlet container.
	Authenticator ecrauth.Interface `yaml:"-"`
	// InstanceTags is the set of EC2 tags attached to instances created by this Cluster.
	InstanceTags map[string]string `yaml:"-"`
	// Labels is the set of labels that should be added as EC2 tags (for informational purpose only).
	Labels pool.Labels `yaml:"-"`
	// Spot is set to true when a spot instance is desired.
	Spot bool `yaml:"spot,omitempty"`
	// InstanceProfile is the EC2 instance profile to use for the cluster instances.
	InstanceProfile string `yaml:"instanceprofile,omitempty"`
	// SecurityGroup is the EC2 security group to use for cluster instances.
	SecurityGroup string `yaml:"securitygroup,omitempty"`
	// Subnets is the list of EC2 subnets ids based on which an appropriate subnet (for each AZ) will be determined.
	// That is, when Subnets is specified, the cluster will use ec2.DescribeSubnets API to determine AZ for each subnet.
	// When requesting a spot instance in a particular AZ, the appropriate subnet will be used.
	// If this list contains duplicate subnets for any AZ, behavior (of which subnet is used) is non-deterministic.
	Subnets []string `yaml:"subnets,omitempty"`
	// BootstrapImage is the URL of the image used for instance bootstrap.
	BootstrapImage string `yaml:"-"`
	// BootstrapExpiry is the maximum duration the bootstrap will wait for a reflowlet image after which it dies.
	BootstrapExpiry time.Duration `yaml:"-"`
	// ReflowVersion is the version of reflow binary compatible with this cluster.
	ReflowVersion string `yaml:"-"`
	// MaxPendingInstances is the maximum number of pending instances permitted.
	MaxPendingInstances int `yaml:"maxpendinginstances"`
	// MaxHourlyCostUSD is the maximum hourly cost of concurrent instances permitted (in USD).
	// A best effort is made to not go above this but races induced by multiple managers can increase the size
	// of the cluster beyond this limit. The limit is applied on maximum bid price and hence is an upper bound
	// on the actual incurred cost (which in practice would be much less).
	MaxHourlyCostUSD float64 `yaml:"maxhourlycostusd"`
	// DiskType is the EBS disk type to use.
	DiskType string `yaml:"disktype"`
	// InstanceSizeToDiskSpace is a mapping of EC2 instance size (e.g. xlarge) to starting disk space in GiB.
	InstanceSizeToDiskSpace InstanceSizeToDiskSpace `yaml:"instancesizetodiskspace"`
	// DiskSlices is the number of EBS volumes that are used. When DiskSlices > 1,
	// they are arranged in a RAID0 array to increase throughput.
	DiskSlices int `yaml:"diskslices"`
	// AMI is the VM image used to launch new instances.
	AMI string `yaml:"ami"`
	// Configuration for this Reflow instantiation. Used to provide configs to
	// EC2 instances.
	Configuration infra.Config `yaml:"-"`
	// AWS session
	Session *session.Session `yaml:"-"`
	// TaskDB implementation (if any) where rows are updated for newly created pools.
	TaskDB taskdb.TaskDB `yaml:"-"`

	// Public SSH keys.
	SshKeys []string `yaml:"sshkeys"`
	// AWS key name for launching instances.
	KeyName string `yaml:"keyname"`
	// Immortal determines whether instances should be made immortal.
	Immortal bool `yaml:"immortal,omitempty"`
	// NodeExporterMetricsPort determines whether to run a prometheus node_exporter daemon
	// on each Reflowlet. Setting a value runs the node_exporter daemon and configures it to
	// output prometheus metrics on the given port. Passing a non-zero value also adds an
	// additional route to the general Reflowlet server, such that metrics are made available
	// via proxy over the existing HTTPS connection and the following Reflow command:
	// $ reflow http https://${EC2_INST_PUBLIC_DNS}:9000/v1/node/metrics
	// If the user wishes to use other scrapers to fetch metrics from the Reflowlet over HTTP,
	// they may additionally choose to expose the port via the AWS settings for their Reflow
	// cluster.
	NodeExporterMetricsPort int `yaml:"nodeexportermetricsport,omitempty"`
	// CloudConfig is merged into the instance's cloudConfig before launching.
	CloudConfig cloudConfig `yaml:"cloudconfig"`
	// SpotProbeDepth is the probing depth for spot instance capacity checks.
	SpotProbeDepth int `yaml:"spotprobedepth,omitempty"`

	// Status is used to report cluster and instance status.
	Status *status.Group `yaml:"-"`

	// InstanceTypes defines the set of allowable EC2 instance types for
	// this cluster. If empty, all verified instance types are permitted.
	InstanceTypes []string `yaml:"instancetypes,omitempty"`
	// Name is the name of the cluster config, which defaults to defaultClusterName.
	// Multiple clusters can be launched/maintained simultaneously by using different names.
	Name string `yaml:"name,omitempty"`
	// contains filtered or unexported fields
}

A Cluster implements a runner.Cluster backed by EC2. The cluster expands with demand. Instances are configured so that they shut down when they are idle on a billing boundary.

No local state is stored; state is inferred from labels managed by EC2. Cluster supports safely sharing state across many processes. In this case, the processes coordinate to maintain a shared cluster, where instances can be used by any of the constituent processes. In the case of Reflow, this means that multiple runs (single or batch) share the same cluster efficiently.

func (*Cluster) Allocate

func (c *Cluster) Allocate(ctx context.Context, req reflow.Requirements, labels pool.Labels) (alloc pool.Alloc, err error)

Allocate reserves an alloc with within the resource requirement boundaries form this cluster. If an existing instance can serve the request, it is returned immediately; otherwise new instance(s) are spun up to handle the allocation.

func (*Cluster) Available

func (c *Cluster) Available(need reflow.Resources, maxPrice float64) (InstanceSpec, bool)

Available returns the cheapest available instance specification that has at least the required resources.

func (*Cluster) CanAllocate

func (c *Cluster) CanAllocate(r reflow.Resources) (bool, error)

CanAllocate returns whether this cluster can allocate the given amount of resources.

func (*Cluster) CheapestInstancePriceUSD

func (c *Cluster) CheapestInstancePriceUSD() float64

func (*Cluster) Config

func (c *Cluster) Config() interface{}

Config implements infra.Provider

func (*Cluster) ExportStats

func (c *Cluster) ExportStats()

ExportStats exports the cluster stats to expvar.

func (*Cluster) GetName

func (c *Cluster) GetName() string

GetName implements runner.Cluster

func (*Cluster) Help

func (*Cluster) Help() string

Help implements infra.Provider

func (*Cluster) Init

func (c *Cluster) Init(tls tls.Certs, sess *session.Session, labels pool.Labels,
	bootstrapimage *infra2.BootstrapImage, reflowVersion *infra2.ReflowVersion, id *infra2.User, logger *log.Logger,
	ssh infra2.Ssh, mclient metrics.Client) error

Init implements infra.Provider

func (*Cluster) InstancePriceUSD

func (c *Cluster) InstancePriceUSD(typ string) float64

func (*Cluster) Launch

func (c *Cluster) Launch(ctx context.Context, spec InstanceSpec) ManagedInstance

Launch launches an EC2 instance based on the given spec and returns a ManagedInstance.

func (*Cluster) MaxAlloc

func (c *Cluster) MaxAlloc() reflow.Resources

MaxAlloc returns the max resources which can be obtained in a single alloc from this cluster.

func (*Cluster) Notify

func (c *Cluster) Notify(waiting, pending reflow.Resources)

func (*Cluster) Probe

func (c *Cluster) Probe(ctx context.Context, instanceType string) (reflow.Resources, time.Duration, error)

Probe attempts to instantiate an EC2 instance of the given type and returns the available resources on it (as per its offers), a duration and an error. In case of a nil error: - the duration represents how long it took for a usable Reflowlet to come up on that instance type. - the resources represents how much actual resources are available/usable on that instance type. Note: Of course the above are based on a single data point. A non-nil error means that the reflowlet failed to come up on this instance type. The error could be due to context deadline, in case we gave up waiting for it to come up.

func (*Cluster) QueryTags

func (c *Cluster) QueryTags() map[string]string

QueryTags returns the list of tags to use to query for instances belonging to this cluster. This includes all InstanceTags that are set on any instance brought up by this cluster, and a "reflowlet:version" tag (set on the instance by the reflowlet once it comes up) to match the ReflowVersion of this cluster.

func (*Cluster) Refresh

func (c *Cluster) Refresh(ctx context.Context) (map[string]string, error)

func (*Cluster) Region

func (c *Cluster) Region() string

Region is the AWS region to use for launching new EC2 instances.

func (*Cluster) Setup

func (c *Cluster) Setup(sess *session.Session) error

Setup sets defaults for any unset ec2 configuration values.

func (*Cluster) Start

func (c *Cluster) Start(ctx context.Context, wg *sync.WaitGroup)

Start initializes the cluster and it should be called before any `pool.Pool` operations are performed on the cluster. Start uses the provided context to maintain the cluster and upon cancellation, the cluster will shutdown. Start can be called multiple times, but only parameters passed to the first call (which started the cluster) are relevant. Start takes a WaitGroup whose counter is updated to reflect background goroutines.

func (*Cluster) Verify

func (c *Cluster) Verify() error

Verify verifies configuration settings.

type InstanceSizeToDiskSpace

type InstanceSizeToDiskSpace map[string]int

InstanceSizeToDiskSpace is a mapping of EC2 instance size (e.g. xlarge) to starting disk space in GiB.

type InstanceSpec

type InstanceSpec struct {
	Type      string
	Resources reflow.Resources
}

InstanceSpec is a specification representing an instance configuration.

func (InstanceSpec) Instance

func (i InstanceSpec) Instance(id string) ManagedInstance

Instance creates a ManagedInstance for this specification with the given id.

type InstanceTypeStat

type InstanceTypeStat struct {
	// InstanceType is an AWS EC2 instance type, i.e. r3.8xlarge
	InstanceType string
	// Count is the number of instances with the specified type.
	Count int
}

InstanceTypeStat is a tuple that stores the count of instances of a specified type within an AWS EC2 cluster.

type ManagedCluster

type ManagedCluster interface {
	// Launch launches an instance with the given specification.
	Launch(ctx context.Context, spec InstanceSpec) ManagedInstance

	// Refresh refreshes the managed cluster and returns a mapping of instance ID to instance type.
	Refresh(ctx context.Context) (map[string]string, error)

	// Available returns any available instance specification that can satisfy the need.
	// The returned InstanceSpec should be subsequently be 'Launch'able.
	Available(need reflow.Resources, maxPrice float64) (InstanceSpec, bool)

	// Notify notifies the managed cluster of the currently waiting and pending
	// amount of resources.
	Notify(waiting, pending reflow.Resources)

	// InstancePriceUSD returns the maximum hourly price bid in USD for the given instance type.
	InstancePriceUSD(typ string) float64

	// CheapestInstancePriceUSD returns the minimum hourly price bid in USD for all known instance types.
	CheapestInstancePriceUSD() float64
}

ManagedCluster is a cluster which can be managed.

type ManagedInstance

type ManagedInstance struct {
	InstanceSpec
	ID string
}

ManagedInstance represents a concrete instance with a given specification and ID.

func (ManagedInstance) Valid

func (m ManagedInstance) Valid() bool

Valid returns whether this is a valid instance (with a non-empty ID)

type Manager

type Manager struct {
	// contains filtered or unexported fields
}

Manager manages the cluster by fulfilling requests to allocate instances asynchronously, while delegating the actual launching of a new instance to the ManagedCluster. Once created, a manager needs to be initialized.

func NewManager

func NewManager(c ManagedCluster, maxHourlyCostUSD float64, maxPendingInstances int, log *log.Logger) *Manager

NewManager creates a manager for the given managed cluster with the specified parameters.

func (*Manager) Allocate

func (m *Manager) Allocate(ctx context.Context, req reflow.Requirements) <-chan struct{}

Allocate requests the Manager to allocate an instance for the given requirements Returns a channel the caller can wait on for confirmation request has been fulfilled.

func (*Manager) SetTimeouts

func (m *Manager) SetTimeouts(refreshInterval, drainTimeout, launchTimeout time.Duration)

SetTimeouts sets the various timeout durations (primarily used for integration testing).

func (*Manager) Start

func (m *Manager) Start(ctx context.Context, wg *sync.WaitGroup)

Start initializes and starts the cluster manager (and its management goroutines)

type OverallStats

type OverallStats struct {
	// InstanceIds is a slice of the instanceIds that are active within the ec2cluster maintained
	// by the current process.
	InstanceIds []string
	// TotalsByType is a slice of InstanceTypeStat tuples that define aggregations of the instances
	// in InstanceIds by instance type.
	TotalsByType []InstanceTypeStat
}

OverallStats is a set of variables that describe instances within an ec2cluster and various aggregations of those instances (i.e. by instance type).

Directories

Path Synopsis
Package volume implements support for maintaining (EBS) volumes on an EC2 instance by watching the disk usage of the underlying disk device and resizing the EBS volumes whenever necessary based on provided parameters.
Package volume implements support for maintaining (EBS) volumes on an EC2 instance by watching the disk usage of the underlying disk device and resizing the EBS volumes whenever necessary based on provided parameters.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL