Documentation ¶
Index ¶
- Constants
- Variables
- func ExplainError(err error) string
- type BuildNodesFn
- type CleanupResourcesFn
- type DynSupervisor
- type ErrKVs
- type Event
- type EventNotifier
- type EventTag
- type HealthReport
- type HealthcheckMonitor
- type Node
- func NewDynSubtree(name string, startFn func(context.Context, Spawner) error, spawnerOpts []Opt, ...) Node
- func NewDynSubtreeWithNotifyStart(name string, runFn func(context.Context, NotifyStartFn, Spawner) error, ...) Node
- func NewWorker(name string, startFn func(context.Context) error, opts ...c.Opt) Node
- func NewWorkerWithNotifyStart(name string, startFn func(context.Context, NotifyStartFn) error, opts ...c.Opt) Node
- func Subtree(subtreeSpec SupervisorSpec, opts ...c.Opt) Node
- type NotifyStartFn
- type Opt
- type Order
- type RestartToleranceReached
- type Spawner
- type Strategy
- type Supervisor
- type SupervisorBuildError
- type SupervisorRestartError
- type SupervisorSpec
- type SupervisorStartError
- type SupervisorTerminationError
Constants ¶
const NodeSepToken = "/"
NodeSepToken is the token use to separate sub-trees and child node names in the supervision tree
Variables ¶
var HealthyReport = HealthReport{}
HealthyReport represents a healthy report
var WithOrder = WithStartOrder
WithOrder is a backwards compatible alias to WithStartOrder
Deprecated: Use WithStartOrder instead
Functions ¶
func ExplainError ¶
ExplainError is a utility function that explains capataz errors in a human-friendly way. Defaults to a call to error.Error() if the underlying error does not come from the capataz library
Types ¶
type BuildNodesFn ¶
type BuildNodesFn = func() ([]Node, CleanupResourcesFn, error)
BuildNodesFn is a function that returns a list of nodes
Check the documentation of NewSupervisorSpec for more details and examples.
func WithNodes ¶
func WithNodes(nodes ...Node) BuildNodesFn
WithNodes allows the registration of child nodes in a SupervisorSpec. Node records passed to this function are going to be supervised by the Supervisor created from a SupervisorSpec.
Check the documentation of NewSupervisorSpec for more details and examples.
type CleanupResourcesFn ¶
type CleanupResourcesFn = func() error
CleanupResourcesFn is a function that cleans up resources that were allocated in a BuildNodesFn function.
Check the documentation of NewSupervisorSpec for more details and examples
type DynSupervisor ¶
type DynSupervisor struct {
// contains filtered or unexported fields
}
DynSupervisor is a supervisor that can spawn workers in a procedural way.
func NewDynSupervisor ¶
NewDynSupervisor creates a DynamicSupervisor which can start workers at runtime in a procedural manner. It receives a context and the supervisor name (for tracing purposes).
When to use a DynSupervisor?
If you want to run supervised worker routines on dynamic inputs. This is something that a regular Supervisor cannot do, as it needs to know the children nodes at construction time.
Differences to Supervisor ¶
As opposed to a Supervisor, a DynSupervisor:
* Cannot receive node specifications to start them in an static fashion
* It is able to spawn workers dynamically
- In case of a hard crash and following restart, it will start with an empty list of children
func (DynSupervisor) GetName ¶
func (dyn DynSupervisor) GetName() string
GetName returns the name of the Spec used to start this Supervisor
func (*DynSupervisor) Spawn ¶
func (dyn *DynSupervisor) Spawn(nodeFn Node) (func() error, error)
Spawn creates a new worker routine from the given node specification. It either returns a cancel/shutdown callback or an error in the scenario the start of this worker failed. This function blocks until the worker is started.
func (*DynSupervisor) Terminate ¶
func (dyn *DynSupervisor) Terminate() error
Terminate is a synchronous procedure that halts the execution of the whole supervision tree.
func (DynSupervisor) Wait ¶
func (dyn DynSupervisor) Wait() error
Wait blocks the execution of the current goroutine until the Supervisor finishes it execution.
type ErrKVs ¶
type ErrKVs interface {
KVs() map[string]interface{}
}
ErrKVs is an utility interface used to get key-values out of Capataz errors
type Event ¶
type Event struct {
// contains filtered or unexported fields
}
Event is a record emitted by the supervision system. The events are used for multiple purposes, from testing to monitoring the healthiness of the supervision system.
func (Event) GetCreated ¶
GetCreated returns a timestamp of the creation of the event by the process
func (Event) GetNodeTag ¶
GetNodeTag returns the c.ChildTag from an Event
func (Event) GetProcessRuntimeName ¶
GetProcessRuntimeName returns the given name of a process that emitted this event
type EventNotifier ¶
type EventNotifier func(Event)
EventNotifier is a function that is used for reporting events from the from the supervision system.
Check the documentation of WithNotifier for more details.
type EventTag ¶
type EventTag uint32
EventTag specifies the type of Event that gets notified from the supervision system
const ( // ProcessStarted is an Event that indicates a process started ProcessStarted EventTag // ProcessTerminated is an Event that indicates a process was stopped by a parent // supervisor ProcessTerminated // ProcessStartFailed is an Event that indicates a process failed to start ProcessStartFailed // ProcessFailed is an Event that indicates a process reported an error ProcessFailed // ProcessCompleted is an Event that indicates a process finished without errors ProcessCompleted )
type HealthReport ¶
type HealthReport struct {
// contains filtered or unexported fields
}
HealthReport contains a report for the HealthMonitor
func (HealthReport) GetDelayedRestartProcesses ¶
func (hr HealthReport) GetDelayedRestartProcesses() map[string]bool
GetDelayedRestartProcesses returns a list of the delayed restart processes
func (HealthReport) GetFailedProcesses ¶
func (hr HealthReport) GetFailedProcesses() map[string]bool
GetFailedProcesses returns a list of the failed processes
func (HealthReport) IsHealthyReport ¶
func (hr HealthReport) IsHealthyReport() bool
IsHealthyReport indicates if this is a healthy report
type HealthcheckMonitor ¶
type HealthcheckMonitor struct {
// contains filtered or unexported fields
}
HealthcheckMonitor listens to the events of a supervision tree events, and assess if the supervisor is healthy or not
func NewHealthcheckMonitor ¶
func NewHealthcheckMonitor( maxAllowedFailures uint32, maxAllowedRestartDuration time.Duration, ) *HealthcheckMonitor
NewHealthcheckMonitor offers a way to monitor a supervision tree health from events emitted by it.
maxAllowedFailures: the threshold beyond which the environment is considered
unhealthy.
maxAllowedRestartDuration: the restart threshold, which if exceeded, indicates
an unhealthy environment. Any process that fails to restart under the threshold results in an unhealthy report
func (*HealthcheckMonitor) GetHealthReport ¶
func (h *HealthcheckMonitor) GetHealthReport() HealthReport
GetHealthReport returns a string that indicates why a the system is unhealthy. Returns empty if everything is ok.
func (*HealthcheckMonitor) HandleEvent ¶
func (h *HealthcheckMonitor) HandleEvent(ev Event)
HandleEvent is a function that receives supervision events and assess if the supervisor sending these events is healthy or not
func (*HealthcheckMonitor) IsHealthy ¶
func (h *HealthcheckMonitor) IsHealthy() bool
IsHealthy return true when the system is in a healthy state, meaning, no processes restarting at the moment
type Node ¶
type Node func(SupervisorSpec) c.ChildSpec
Node represents a tree node in a supervision tree, it could either be a Subtree or a Worker
func NewDynSubtree ¶
func NewDynSubtree( name string, startFn func(context.Context, Spawner) error, spawnerOpts []Opt, opts ...c.Opt, ) Node
NewDynSubtree builds a worker that has receives a Spawner that allows it to create more child workers dynamically in a sub-tree. NewDynSubtree builds a worker that receives a Spawner which allows it to create more child workers dynamically in a sibling sub-tree.
The runtime subtree is composed of a worker and a supervisor ¶
<name> | `- spawner (creates dynamic workers in sibling subtree) | `- subtree
| `- <dynamic_worker>
Note: The Spawner is automatically managed by the supervision tree, so clients are not required to terminate it explicitly.
func NewDynSubtreeWithNotifyStart ¶
func NewDynSubtreeWithNotifyStart( name string, runFn func(context.Context, NotifyStartFn, Spawner) error, spawnerOpts []Opt, opts ...c.Opt, ) Node
NewDynSubtreeWithNotifyStart accomplishes the same goal as NewDynSubtree with the addition of passing an extra argument (notifyStart callback) to the startFn function parameter.
func NewWorker ¶
NewWorker creates a Node that represents a worker goroutine. It requires two arguments: a name that is used for runtime tracing and a startFn function.
The name argument ¶
A name argument must not be empty nor contain forward slash characters (e.g. /), otherwise, the system will panic[*].
[*] This method is preferred as opposed to return an error given it is considered a bad implementation (ideally a compilation error).
The startFn argument ¶
The startFn function is where your business logic should be located. This function will be running on a new supervised goroutine.
The startFn function will receive a context.Context record that *must* be used inside your business logic to accept stop signals from its parent supervisor.
Depending on the Shutdown values used with the WithShutdown settings of the worker, if the `startFn` function does not respect the given context, the parent supervisor will either block forever or leak goroutines after a timeout has been reached.
func NewWorkerWithNotifyStart ¶
func NewWorkerWithNotifyStart( name string, startFn func(context.Context, NotifyStartFn) error, opts ...c.Opt, ) Node
NewWorkerWithNotifyStart accomplishes the same goal as NewWorker with the addition of passing an extra argument (notifyStart callback) to the startFn function parameter.
The NotifyStartFn argument ¶
Sometimes you want to consider a goroutine started after certain initialization was done; like doing a read from a Database or API, or some socket is bound, etc. The NotifyStartFn is a callback that allows the spawned worker goroutine to signal when it has officially started.
It is essential to call this callback function in your business logic as soon as you consider the worker is initialized, otherwise the parent supervisor will block and eventually fail with a timeout.
Report a start error on NotifyStartFn ¶
If for some reason, a child node is not able to start correctly (e.g. DB connection fails, network is kaput), the node may call the given NotifyStartFn function with the impending error as a parameter. This will cause the whole supervision system start procedure to abort.
func Subtree ¶
func Subtree(subtreeSpec SupervisorSpec, opts ...c.Opt) Node
Subtree transforms SupervisorSpec into a Node. This function allows you to insert a black-box sub-system into a bigger supervised system.
Note the subtree SupervisorSpec is going to inherit the event notifier from its parent supervisor.
Example:
// Initialized a SupervisorSpec that doesn't know anything about other // parts of the systems (e.g. is self-contained) networkingSubsystem := cap.NewSupervisorSpec("net", ...) // Another self-contained system filesystemSubsystem := cap.NewSupervisorSpec("fs", ...) // SupervisorSpec that is started in your main.go cap.NewSupervisorSpec("root", cap.WithNodes( cap.Subtree(networkingSubsystem), cap.Subtree(filesystemSubsystem), ), )
type NotifyStartFn ¶
type NotifyStartFn = c.NotifyStartFn
NotifyStartFn is a function given to worker nodes that allows them to notify the parent supervisor that they are officialy started.
The argument contains an error if there was a failure, nil otherwise.
See the documentation of NewWorkerWithNotifyStart for more details
type Opt ¶
type Opt func(*SupervisorSpec)
Opt is a type used to configure a SupervisorSpec
func WithNotifier ¶
func WithNotifier(en EventNotifier) Opt
WithNotifier is an Opt that specifies a callback that gets called whenever the supervision system reports an Event
This function may be used to observe the behavior of all the supervisors in the systems, and it is a great place to hook in monitoring services like logging, error tracing and metrics gatherers
func WithRestartTolerance ¶
WithRestartTolerance is a Opt that specifies how many errors the supervisor should be willing to tolerate before giving up restarting and fail.
If the tolerance is met, the supervisor is going to fail, if this is a sub-tree, this error is going to be handled by a grand-parent supervisor, restarting the tolerance again.
Example
// Tolerate 10 errors every 5 seconds // // - if there is 11 errors in a 5 second window, it makes the supervisor fail // WithRestartTolerance(10, 5 * time.Second)
func WithStartOrder ¶
WithStartOrder is an Opt that specifies the start/stop order of a supervisor's children nodes
Possible values may be:
* LeftToRight -- Start children nodes from left to right, stop them from right to left
* RightToLeft -- Start children nodes from right to left, stop them from left to right
func WithStrategy ¶
WithStrategy is an Opt that specifies how children nodes of a supervisor get restarted when one of the nodes fails
Possible values may be:
* OneForOne -- Only restart the failing child
* OneForAll (Not Implemented Yet) -- Restart the failing child and all its siblings[*]
[*] This option may come handy when all the other siblings depend on one another to work correctly.
type Order ¶
type Order uint32
Order specifies the order in which a supervisor is going to start its node children. The stop order is the reverse of the start order.
type RestartToleranceReached ¶
type RestartToleranceReached struct {
// contains filtered or unexported fields
}
RestartToleranceReached is an error that gets reported when a supervisor has restarted a child so many times over a period of time that it does not make sense to keep restarting.
func NewRestartToleranceReached ¶
func NewRestartToleranceReached( tolerance restartTolerance, sourceCh c.Child, sourceErr error, lastErr error, ) *RestartToleranceReached
NewRestartToleranceReached creates an ErrorToleranceReached record
func (*RestartToleranceReached) Error ¶
func (err *RestartToleranceReached) Error() string
func (*RestartToleranceReached) KVs ¶
func (err *RestartToleranceReached) KVs() map[string]interface{}
KVs returns a data bag map that may be used in structured logging
func (*RestartToleranceReached) Unwrap ¶
func (err *RestartToleranceReached) Unwrap() error
Unwrap returns the last error that caused the creation of an ErrorToleranceReached error
type Strategy ¶
type Strategy uint32
Strategy specifies how children get restarted when one of them reports an error
type Supervisor ¶
type Supervisor struct {
// contains filtered or unexported fields
}
Supervisor represents the root of a tree of goroutines. A Supervisor may have leaf or sub-tree children, where each of the nodes in the tree represent a goroutine that gets automatic restart abilities as soon as the parent supervisor detects an error has occured. A Supervisor will always be generated from a SupervisorSpec
func (Supervisor) GetCrashError ¶
func (sup Supervisor) GetCrashError(block bool) (bool, error)
GetCrashError is a non-blocking function that returns a crash error if there is one, the first parameter indicates if the supervisor is running or not. If the returned error is not nil, the first result will always be true.
func (Supervisor) GetName ¶
func (sup Supervisor) GetName() string
GetName returns the name of the Spec used to start this Supervisor
func (Supervisor) Terminate ¶
func (sup Supervisor) Terminate() error
Terminate is a synchronous procedure that halts the execution of the whole supervision tree.
func (Supervisor) Wait ¶
func (sup Supervisor) Wait() error
Wait blocks the execution of the current goroutine until the Supervisor finishes it execution.
type SupervisorBuildError ¶
type SupervisorBuildError struct {
// contains filtered or unexported fields
}
SupervisorBuildError wraps errors returned from a client provided function that builds the supervisor nodes, enhancing it with supervisor information
func (*SupervisorBuildError) Error ¶
func (err *SupervisorBuildError) Error() string
func (*SupervisorBuildError) KVs ¶
func (err *SupervisorBuildError) KVs() map[string]interface{}
KVs returns a metadata map for structured logging
type SupervisorRestartError ¶
type SupervisorRestartError struct {
// contains filtered or unexported fields
}
SupervisorRestartError wraps an error tolerance surpassed error from a child node, enhancing it with supervisor information and possible termination errors on other siblings
func (*SupervisorRestartError) Error ¶
func (err *SupervisorRestartError) Error() string
Error returns an error message
func (*SupervisorRestartError) KVs ¶
func (err *SupervisorRestartError) KVs() map[string]interface{}
KVs returns a metadata map for structured logging
type SupervisorSpec ¶
type SupervisorSpec struct {
// contains filtered or unexported fields
}
SupervisorSpec represents the specification of a static supervisor; it serves as a template for the construction of a runtime supervision tree. In the SupervisorSpec you can specify settings like:
* The children (workers or sub-trees) you want spawned in your system when it starts
* The order in which the supervised node children get started
* Notifies the supervisor to restart a child node (and, if specified all its siblings as well) when the node fails in unexpected ways.
func NewSupervisorSpec ¶
func NewSupervisorSpec(name string, buildNodes BuildNodesFn, opts ...Opt) SupervisorSpec
NewSupervisorSpec creates a SupervisorSpec. It requires the name of the supervisor (for tracing purposes) and some children nodes to supervise.
Monitoring children that do not share resources ¶
This is intended for situations where you need worker goroutines that are self-contained running in the background.
To specify a group of children nodes, you need to use the WithNodes utility function. This function may receive Subtree or Worker nodes.
Example:
cap.NewSupervisorSpec("root", // (1) // Specify child nodes to spawn when this supervisor starts cap.WithNodes( cap.Subtree(subtreeSupervisorSpec), workerChildSpec, ), // (2) // Specify child nodes start from right to left (reversed order) and // stop from left to right. cap.WithStartOrder(cap.RightToLeft), )
Monitoring nodes that share resources ¶
Sometimes, you want a group of children nodes to interact between each other via some shared resource that only the workers know about (for example, a gochan, a db datapool, etc).
You are able to specify a custom function (BuildNodesFn) that allocates and releases these resources.
This function should return:
* The children nodes of the supervision tree
* A function that cleans up the allocated resources (CleanupResourcesFn)
* An error, but only in the scenario where a resource initialization failed
Example:
cap.NewSupervisorSpec("root", // (1) // Implement a function that return all nodes to be supervised. // When this supervisor gets (re)started, this function will be called. // Imagine this function as a factory for it's children. func() ([]cap.Node, cap.CleanupResourcesFn, error) { // In this example, child nodes have a shared resource (a gochan) // and it gets passed to their constructors. buffer := make(chan MyType) nodes := []cap.Node{ producerWorker(buffer), consumerWorker(buffer), } // We create a function that gets executed when the supervisor // shuts down. cleanup := func() { close(buffer) } // We return the allocated Node records and the cleanup function return nodes, cleanup, nil }, // (2) cap.WithStartOrder(cap.RightToLeft), )
Dealing with errors ¶
Given resources can involve IO allocations, using this functionality opens the door to a few error scenarios:
1) Resource allocation returns an error
In this scenario, the supervision start procedure will fail and it will follow the regular shutdown procedure: the already started nodes will be terminated and an error will be returned immediately.
2) Resource cleanup returns an error
In this scenario, the termination procedure will collect the error and report it in the returned SupervisorError.
3) Resource allocation/cleanup hangs
This library does not handle this scenario. Is the responsibility of the user of the API to implement start timeouts and cleanup timeouts inside the given BuildNodesFn and CleanupResourcesFn functions.
func (SupervisorSpec) GetName ¶
func (spec SupervisorSpec) GetName() string
GetName returns the given name of the supervisor spec (not a runtime name)
func (SupervisorSpec) Start ¶
func (spec SupervisorSpec) Start(startCtx context.Context) (Supervisor, error)
Start creates a Supervisor from this SupervisorSpec.
A Supervisor is a tree of workers and/or sub-trees. The Start algorithm spawns the leaf worker goroutines first and then it will go up into the supervisor sub-trees. Depending on the SupervisorSpec's order, it will do an initialization in pre-order (LeftToRight) or post-order (RightToLeft).
Supervisor Tree Initialization ¶
Once all the children leafs are initialized and running, the supervisor will execute its supervision monitor logic (listening to failures on its children). Invoking this method will block the thread until all the children and its sub-tree's childrens have been started.
Failures on Child Initialization ¶
In the scenario that one of the child nodes fails to start (IO error, etc.), the Start algorithm is going to abort the start routine, and is going to stop in reverse order all the child nodes that have been started, finally returning an error value.
type SupervisorStartError ¶
type SupervisorStartError struct {
// contains filtered or unexported fields
}
SupervisorStartError wraps an error reported on the initialization of a child node, enhancing it with supervisor information and possible termination errors on other siblings
func (*SupervisorStartError) Error ¶
func (err *SupervisorStartError) Error() string
Error returns an error message
func (*SupervisorStartError) KVs ¶
func (err *SupervisorStartError) KVs() map[string]interface{}
KVs returns a metadata map for structured logging
type SupervisorTerminationError ¶
type SupervisorTerminationError struct {
// contains filtered or unexported fields
}
SupervisorTerminationError wraps errors returned by a child node that failed to terminate (io errors, timeouts, etc.), enhancing it with supervisor information. Note, the only way to have a valid SupervisorTerminationError is for one of the child nodes to fail or the supervisor cleanup operation fails.
func (*SupervisorTerminationError) Error ¶
func (err *SupervisorTerminationError) Error() string
Error returns an error message
func (*SupervisorTerminationError) KVs ¶
func (err *SupervisorTerminationError) KVs() map[string]interface{}
KVs returns a metadata map for structured logging