nexnode

package

v0.0.0-...-5692323 Latest Latest Go to latest Published: Apr 26, 2024 License: Apache-2.0 Imports: 47 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/synadia-io/nex

Links

Open Source Insights

README ¶

NATS Execution Engine Node

The NATS execution engine node service runs as a daemon on a host where it responds to control commands, starting, stopping, and monitoring workloads.

Requirements

The following is a (potentially incomplete) list of the things you'll need in order to be able to run the node application.

An accessible copy of the firecracker binary in the path of the node service
A linux kernel
A root file system with nex-agent configured as a boot service.
CNI plugins installed (at least) to /opt/cni/bin
- host-local
- ptp
- tc-redirect-tap
A CNI configuration file at /etc/cni/conf.d/fcnet.conflist

Running and Testing

First and foremost, the nex node application makes use of the firecracker SDK to start VMs, which is an act that requires a number of elevated privileges. When running locally for testing, it's easy enough to just sudo nex-node .... but in production, you might want to craft a bespoke user and group that can do just enough to make firecracker happy.

Because nex node utilizes firecracker VMs, ⚠️ it can only run on 64-bit Linux ⚠️. The workloads you submit to execute on nex node must be statically linked, freestanding Linux elf binaries. The node service will reject dynamically-linked or non-elf binaries.

As you run this and create warm VMs to reside in the pool, you'll be allocating IP addresses from the firecracker network CNI device (defaults to fcnet). This means that eventually you'll run out of IP addresses during development. To clear them, you can purge your /var/lib/cni/networks/{device} directory. This will start IP allocation back at .2 (.1 is the host).

Additionally, while running and testing this you may end up with a large amount of idle/dormant veth devices. To remove them all, you can use the following script:

for name in $(ifconfig -a | sed 's/[ \t].*//;/^\(lo\|\)$/d' | grep veth)
do
    echo $name
    sudo ip link delete $name
done

⚠️ If you're working in a contributor loop and you modify the rootfs after some firecracker IPs have already been allocated, it can potentially wreck the host networking and you'll notice the agents suddenly unable to communicate. If this happens, just purge the IPs and the veth devices to start over fresh.

Configuration

The nex node service needs to know the size and shape of the firecracker machines to dispense. As a result, it needs a JSON file that describes the cookie cutter from which VMs are stamped. Here's a sample machineconfig.json file:

{
    "kernel_path": "/home/kevin/lab/firecracker/vmlinux-5.10",
    "rootfs_path": "/home/kevin/lab/firecracker/rootfs.ext4",
    "machine_pool_size": 1,
    "cni": {
        "network_name": "fcnet",
        "interface_name": "veth0"
    },
    "machine_template": {
        "vcpu_count": 1,
        "memsize_mib": 256
    },
    "requester_public_keys": []
}

This file tells nex node where to find the kernel and rootfs for the firecracker VMs, as well as the CNI configuration. Finally, if you supply a non-empty value for requester_public_keys, that will serve as an allow-list for public Xkeys that can be used to submit requests. XKeys are basically nkeys that can be used for encryption. Note that the network_name field must match exactly the {network_name}.conflist file in /etc/cni/conf.d.

Reference

Here's a look at the fcnet.conflist file that we use as a default. This gives each firecracker VM access to whatever the host can access, and allows the host to make inbound requests. Regardless of your CNI configuration, the agent process must be able to communicate with the host node via the internal NATS server (defaults to running in port 9222).

{
  "name": "fcnet",
  "cniVersion": "0.4.0",
  "plugins": [
    {
      "type": "ptp",
      "ipMasq": true,
      "ipam": {
        "type": "host-local",
        "subnet": "192.168.127.0/24",
        "resolvConf": "/etc/resolv.conf"
      }
    },
    {
      "type": "tc-redirect-tap"
    }
  ]
}

If you want to add additional constraints and rules to the fcnet definition, consult the CNI documentation as we use just plain CNI configuration files here.

Observing Events

You can monitor events coming out of the node and workloads running within the node by subscribing on the following subject pattern:

$NEX.events.{namespace}.{event_type}

The payload of these events is a CloudEvent envelope containing an inner JSON object for the data field.

Observing Logs

You can subscribe to log emissions without console access by using the following subject pattern:

$NEX.logs.{namespace}.{host}.{workload}.{vmId}

This gives you the flexibility of monitoring everything from a given host, workload, or vmID. For example, if you want to see an aggregate of all logs emitted by the bankservice workload throughout an entire system, you could just subscribe to:

$NEX.logs.*.*.bankservice.*

Documentation ¶

Index ¶

Constants
Variables
func CheckPrerequisites(config *models.NodeConfiguration, readonly bool) error
func CmdPreflight(opts *nexmodels.Options, nodeopts *nexmodels.NodeOptions, ctx context.Context, ...) error
func CmdUp(opts *nexmodels.Options, nodeopts *nexmodels.NodeOptions, ctx context.Context, ...) error
func FullVersion() string
func LoadNodeConfiguration(configFilepath string) (*models.NodeConfiguration, error)
func PublishCloudEvent(nc *nats.Conn, namespace string, event cloudevents.Event, log *slog.Logger) error
func ReadMemoryStats() (*controlapi.MemoryStat, error)
func Version() string
type AgentProxy
- func NewAgentProxyWith(agent *processmanager.ProcessInfo) *AgentProxy
type ApiListener
- func NewApiListener(log *slog.Logger, mgr *WorkloadManager, node *Node) *ApiListener
- func (api *ApiListener) PublicKey() string
- func (api *ApiListener) Start() error
type HostServices
- func NewHostServices(mgr *WorkloadManager, nc, ncint *nats.Conn, log *slog.Logger) *HostServices
type Node
- func NewNode(opts *models.Options, nodeOpts *models.NodeOptions, ctx context.Context, ...) (*Node, error)
- func (n *Node) EnterLameDuck() error
- func (n *Node) IsLameDuck() bool
- func (n *Node) PublicKey() (*string, error)
- func (n *Node) Start()
- func (n *Node) Stop()
type NodeProxy
- func NewNodeProxyWith(node *Node) *NodeProxy
- func (n *NodeProxy) APIListener() *ApiListener
- func (n *NodeProxy) InternalNATS() *server.Server
- func (n *NodeProxy) InternalNATSConn() *nats.Conn
- func (n *NodeProxy) Log() *slog.Logger
- func (n *NodeProxy) NodeConfiguration() *models.NodeConfiguration
- func (n *NodeProxy) Telemetry() *observability.Telemetry
- func (n *NodeProxy) WorkloadManager() *WorkloadManager
type WorkloadManager
- func NewWorkloadManager(ctx context.Context, cancel context.CancelFunc, nodeKeypair nkeys.KeyPair, ...) (*WorkloadManager, error)
- func (m *WorkloadManager) CacheWorkload(request *controlapi.DeployRequest) (uint64, *string, error)
- func (w *WorkloadManager) DeployWorkload(request *agentapi.DeployRequest) (*string, error)
- func (w *WorkloadManager) LookupWorkload(workloadID string) (*agentapi.DeployRequest, error)
- func (w *WorkloadManager) OnProcessStarted(id string)
- func (w *WorkloadManager) RunningWorkloads() ([]controlapi.MachineSummary, error)
- func (w *WorkloadManager) Start()
- func (w *WorkloadManager) Stop() error
- func (w *WorkloadManager) StopWorkload(id string, undeploy bool) error
type WorkloadManagerProxy
- func NewWorkloadManagerProxyWith(manager *WorkloadManager) *WorkloadManagerProxy
- func (w *WorkloadManagerProxy) Agents() map[string]*processmanager.ProcessInfo
- func (w *WorkloadManagerProxy) AllAgents() map[string]*processmanager.ProcessInfo
- func (m *WorkloadManagerProxy) InternalNATSConn() *nats.Conn
- func (m *WorkloadManagerProxy) Log() *slog.Logger
- func (m *WorkloadManagerProxy) NodeConfiguration() *models.NodeConfiguration
- func (w *WorkloadManagerProxy) PoolAgents() map[string]*processmanager.ProcessInfo
- func (m *WorkloadManagerProxy) Telemetry() *observability.Telemetry

Constants ¶

View Source

const (
	EventSubjectPrefix      = "$NEX.events"
	LogSubjectPrefix        = "$NEX.logs"
	WorkloadCacheBucketName = "NEXCACHE"
)

Variables ¶

View Source

var (
	VERSION   = "development"
	COMMIT    = ""
	BUILDDATE = ""
)

Functions ¶

func CheckPrerequisites ¶

func CheckPrerequisites(config *models.NodeConfiguration, readonly bool) error

func CmdPreflight ¶

func CmdPreflight(opts *nexmodels.Options, nodeopts *nexmodels.NodeOptions, ctx context.Context, cancel context.CancelFunc, log *slog.Logger) error

func CmdUp ¶

func CmdUp(opts *nexmodels.Options, nodeopts *nexmodels.NodeOptions, ctx context.Context, cancel context.CancelFunc, log *slog.Logger) error

func FullVersion ¶

func FullVersion() string

func LoadNodeConfiguration ¶

func LoadNodeConfiguration(configFilepath string) (*models.NodeConfiguration, error)

Reads the node configuration from the specified configuration file path

func PublishCloudEvent ¶

func PublishCloudEvent(nc *nats.Conn, namespace string, event cloudevents.Event, log *slog.Logger) error

publish the given $NEX event to an arbitrary namespace using the given NATS connection

func ReadMemoryStats ¶

func ReadMemoryStats() (*controlapi.MemoryStat, error)

This function only works on Linux, but that's okay since nex-node can only run on 64-bit linux

func Version ¶

func Version() string

Types ¶

type AgentProxy ¶

type AgentProxy struct {
	// contains filtered or unexported fields
}

func NewAgentProxyWith ¶

func NewAgentProxyWith(agent *processmanager.ProcessInfo) *AgentProxy

type ApiListener ¶

type ApiListener struct {
	// contains filtered or unexported fields
}

The API listener is the command and control interface for the node server

func NewApiListener ¶

func NewApiListener(log *slog.Logger, mgr *WorkloadManager, node *Node) *ApiListener

func (*ApiListener) PublicKey ¶

func (api *ApiListener) PublicKey() string

func (*ApiListener) Start ¶

func (api *ApiListener) Start() error

type HostServices ¶

type HostServices struct {
	// contains filtered or unexported fields
}

Host services server implements select functionality which is exposed to workloads by way of the agent which makes RPC calls via the internal NATS connection

func NewHostServices ¶

func NewHostServices(mgr *WorkloadManager, nc, ncint *nats.Conn, log *slog.Logger) *HostServices

type Node ¶

type Node struct {
	// contains filtered or unexported fields
}

Nex node process

func NewNode ¶

func NewNode(opts *models.Options, nodeOpts *models.NodeOptions, ctx context.Context, cancelF context.CancelFunc, log *slog.Logger) (*Node, error)

func (*Node) EnterLameDuck ¶

func (n *Node) EnterLameDuck() error

func (*Node) IsLameDuck ¶

func (n *Node) IsLameDuck() bool

func (*Node) PublicKey ¶

func (n *Node) PublicKey() (*string, error)

func (*Node) Start ¶

func (n *Node) Start()

func (*Node) Stop ¶

func (n *Node) Stop()

type NodeProxy ¶

type NodeProxy struct {
	// contains filtered or unexported fields
}

Use this proxy object with extreme care, as it exposes the private/internal bits of a node instance to callers. It was created only as a way to make writing specs work and should not be used for any other purpose!

func NewNodeProxyWith ¶

func NewNodeProxyWith(node *Node) *NodeProxy

func (*NodeProxy) APIListener ¶

func (n *NodeProxy) APIListener() *ApiListener

func (*NodeProxy) InternalNATS ¶

func (n *NodeProxy) InternalNATS() *server.Server

func (*NodeProxy) InternalNATSConn ¶

func (n *NodeProxy) InternalNATSConn() *nats.Conn

func (*NodeProxy) Log ¶

func (n *NodeProxy) Log() *slog.Logger

func (*NodeProxy) NodeConfiguration ¶

func (n *NodeProxy) NodeConfiguration() *models.NodeConfiguration

func (*NodeProxy) Telemetry ¶

func (n *NodeProxy) Telemetry() *observability.Telemetry

func (*NodeProxy) WorkloadManager ¶

func (n *NodeProxy) WorkloadManager() *WorkloadManager

type WorkloadManager ¶

type WorkloadManager struct {
	// contains filtered or unexported fields
}

The workload manager provides the high level strategy for the Nex node's workload management. It is responsible for using a process manager interface to manage processes and maintaining agent clients that communicate with those processes. The workload manager does not know how the agent processes are created, only how to communicate with them via the internal NATS server

func NewWorkloadManager ¶

func NewWorkloadManager(
	ctx context.Context,
	cancel context.CancelFunc,
	nodeKeypair nkeys.KeyPair,
	publicKey string,
	nc, ncint *nats.Conn,
	config *models.NodeConfiguration,
	log *slog.Logger,
	telemetry *observability.Telemetry,
) (*WorkloadManager, error)

Initialize a new workload manager instance to manage and communicate with agents

func (*WorkloadManager) CacheWorkload ¶

func (m *WorkloadManager) CacheWorkload(request *controlapi.DeployRequest) (uint64, *string, error)

func (*WorkloadManager) DeployWorkload ¶

func (w *WorkloadManager) DeployWorkload(request *agentapi.DeployRequest) (*string, error)

Deploy a workload as specified by the given deploy request to an available agent in the configured pool

func (*WorkloadManager) LookupWorkload ¶

func (w *WorkloadManager) LookupWorkload(workloadID string) (*agentapi.DeployRequest, error)

Locates a given workload by its workload ID and returns the deployment request associated with it Note that this means "pending" workloads are not considered by lookups

func (*WorkloadManager) OnProcessStarted ¶

func (w *WorkloadManager) OnProcessStarted(id string)

Called by the agent process manager when an agent has been warmed and is ready to receive workload deployment instructions

func (*WorkloadManager) RunningWorkloads ¶

func (w *WorkloadManager) RunningWorkloads() ([]controlapi.MachineSummary, error)

Retrieve a list of deployed, running workloads

func (*WorkloadManager) Start ¶

func (w *WorkloadManager) Start()

Start the workload manager, which in turn starts the configured agent process manager

func (*WorkloadManager) Stop ¶

func (w *WorkloadManager) Stop() error

Stop the workload manager, which will in turn stop all managed agents and attempt to clean up all applicable resources.

func (*WorkloadManager) StopWorkload ¶

func (w *WorkloadManager) StopWorkload(id string, undeploy bool) error

Stop a workload, optionally attempting a graceful undeploy prior to termination

type WorkloadManagerProxy ¶

type WorkloadManagerProxy struct {
	// contains filtered or unexported fields
}

func NewWorkloadManagerProxyWith ¶

func NewWorkloadManagerProxyWith(manager *WorkloadManager) *WorkloadManagerProxy

func (*WorkloadManagerProxy) Agents ¶

func (w *WorkloadManagerProxy) Agents() map[string]*processmanager.ProcessInfo

func (*WorkloadManagerProxy) AllAgents ¶

func (w *WorkloadManagerProxy) AllAgents() map[string]*processmanager.ProcessInfo

func (*WorkloadManagerProxy) InternalNATSConn ¶

func (m *WorkloadManagerProxy) InternalNATSConn() *nats.Conn

func (*WorkloadManagerProxy) Log ¶

func (m *WorkloadManagerProxy) Log() *slog.Logger

func (*WorkloadManagerProxy) NodeConfiguration ¶

func (m *WorkloadManagerProxy) NodeConfiguration() *models.NodeConfiguration

func (*WorkloadManagerProxy) PoolAgents ¶

func (w *WorkloadManagerProxy) PoolAgents() map[string]*processmanager.ProcessInfo

func (*WorkloadManagerProxy) Telemetry ¶

func (m *WorkloadManagerProxy) Telemetry() *observability.Telemetry

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
observability
processmanager
services
lib
util
templates

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL