nexnode

package
v0.0.0-...-5692323 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 26, 2024 License: Apache-2.0 Imports: 47 Imported by: 0

README

NATS Execution Engine Node

The NATS execution engine node service runs as a daemon on a host where it responds to control commands, starting, stopping, and monitoring workloads.

Requirements

The following is a (potentially incomplete) list of the things you'll need in order to be able to run the node application.

  • An accessible copy of the firecracker binary in the path of the node service
  • A linux kernel
  • A root file system with nex-agent configured as a boot service.
  • CNI plugins installed (at least) to /opt/cni/bin
    • host-local
    • ptp
    • tc-redirect-tap
  • A CNI configuration file at /etc/cni/conf.d/fcnet.conflist

Running and Testing

First and foremost, the nex node application makes use of the firecracker SDK to start VMs, which is an act that requires a number of elevated privileges. When running locally for testing, it's easy enough to just sudo nex-node .... but in production, you might want to craft a bespoke user and group that can do just enough to make firecracker happy.

Because nex node utilizes firecracker VMs, ⚠️ it can only run on 64-bit Linux ⚠️. The workloads you submit to execute on nex node must be statically linked, freestanding Linux elf binaries. The node service will reject dynamically-linked or non-elf binaries.

As you run this and create warm VMs to reside in the pool, you'll be allocating IP addresses from the firecracker network CNI device (defaults to fcnet). This means that eventually you'll run out of IP addresses during development. To clear them, you can purge your /var/lib/cni/networks/{device} directory. This will start IP allocation back at .2 (.1 is the host).

Additionally, while running and testing this you may end up with a large amount of idle/dormant veth devices. To remove them all, you can use the following script:

for name in $(ifconfig -a | sed 's/[ \t].*//;/^\(lo\|\)$/d' | grep veth)
do
    echo $name
    sudo ip link delete $name
done

⚠️ If you're working in a contributor loop and you modify the rootfs after some firecracker IPs have already been allocated, it can potentially wreck the host networking and you'll notice the agents suddenly unable to communicate. If this happens, just purge the IPs and the veth devices to start over fresh.

Configuration

The nex node service needs to know the size and shape of the firecracker machines to dispense. As a result, it needs a JSON file that describes the cookie cutter from which VMs are stamped. Here's a sample machineconfig.json file:

{
    "kernel_path": "/home/kevin/lab/firecracker/vmlinux-5.10",
    "rootfs_path": "/home/kevin/lab/firecracker/rootfs.ext4",
    "machine_pool_size": 1,
    "cni": {
        "network_name": "fcnet",
        "interface_name": "veth0"
    },
    "machine_template": {
        "vcpu_count": 1,
        "memsize_mib": 256
    },
    "requester_public_keys": []
}

This file tells nex node where to find the kernel and rootfs for the firecracker VMs, as well as the CNI configuration. Finally, if you supply a non-empty value for requester_public_keys, that will serve as an allow-list for public Xkeys that can be used to submit requests. XKeys are basically nkeys that can be used for encryption. Note that the network_name field must match exactly the {network_name}.conflist file in /etc/cni/conf.d.

Reference

Here's a look at the fcnet.conflist file that we use as a default. This gives each firecracker VM access to whatever the host can access, and allows the host to make inbound requests. Regardless of your CNI configuration, the agent process must be able to communicate with the host node via the internal NATS server (defaults to running in port 9222).

{
  "name": "fcnet",
  "cniVersion": "0.4.0",
  "plugins": [
    {
      "type": "ptp",
      "ipMasq": true,
      "ipam": {
        "type": "host-local",
        "subnet": "192.168.127.0/24",
        "resolvConf": "/etc/resolv.conf"
      }
    },
    {
      "type": "tc-redirect-tap"
    }
  ]
}

If you want to add additional constraints and rules to the fcnet definition, consult the CNI documentation as we use just plain CNI configuration files here.

Observing Events

You can monitor events coming out of the node and workloads running within the node by subscribing on the following subject pattern:

$NEX.events.{namespace}.{event_type}

The payload of these events is a CloudEvent envelope containing an inner JSON object for the data field.

Observing Logs

You can subscribe to log emissions without console access by using the following subject pattern:

$NEX.logs.{namespace}.{host}.{workload}.{vmId}

This gives you the flexibility of monitoring everything from a given host, workload, or vmID. For example, if you want to see an aggregate of all logs emitted by the bankservice workload throughout an entire system, you could just subscribe to:

$NEX.logs.*.*.bankservice.*

Documentation

Index

Constants

View Source
const (
	EventSubjectPrefix      = "$NEX.events"
	LogSubjectPrefix        = "$NEX.logs"
	WorkloadCacheBucketName = "NEXCACHE"
)

Variables

View Source
var (
	VERSION   = "development"
	COMMIT    = ""
	BUILDDATE = ""
)

Functions

func CheckPrerequisites

func CheckPrerequisites(config *models.NodeConfiguration, readonly bool) error

func CmdPreflight

func CmdPreflight(opts *nexmodels.Options, nodeopts *nexmodels.NodeOptions, ctx context.Context, cancel context.CancelFunc, log *slog.Logger) error

func CmdUp

func CmdUp(opts *nexmodels.Options, nodeopts *nexmodels.NodeOptions, ctx context.Context, cancel context.CancelFunc, log *slog.Logger) error

func FullVersion

func FullVersion() string

func LoadNodeConfiguration

func LoadNodeConfiguration(configFilepath string) (*models.NodeConfiguration, error)

Reads the node configuration from the specified configuration file path

func PublishCloudEvent

func PublishCloudEvent(nc *nats.Conn, namespace string, event cloudevents.Event, log *slog.Logger) error

publish the given $NEX event to an arbitrary namespace using the given NATS connection

func ReadMemoryStats

func ReadMemoryStats() (*controlapi.MemoryStat, error)

This function only works on Linux, but that's okay since nex-node can only run on 64-bit linux

func Version

func Version() string

Types

type AgentProxy

type AgentProxy struct {
	// contains filtered or unexported fields
}

func NewAgentProxyWith

func NewAgentProxyWith(agent *processmanager.ProcessInfo) *AgentProxy

type ApiListener

type ApiListener struct {
	// contains filtered or unexported fields
}

The API listener is the command and control interface for the node server

func NewApiListener

func NewApiListener(log *slog.Logger, mgr *WorkloadManager, node *Node) *ApiListener

func (*ApiListener) PublicKey

func (api *ApiListener) PublicKey() string

func (*ApiListener) Start

func (api *ApiListener) Start() error

type HostServices

type HostServices struct {
	// contains filtered or unexported fields
}

Host services server implements select functionality which is exposed to workloads by way of the agent which makes RPC calls via the internal NATS connection

func NewHostServices

func NewHostServices(mgr *WorkloadManager, nc, ncint *nats.Conn, log *slog.Logger) *HostServices

type Node

type Node struct {
	// contains filtered or unexported fields
}

Nex node process

func NewNode

func NewNode(opts *models.Options, nodeOpts *models.NodeOptions, ctx context.Context, cancelF context.CancelFunc, log *slog.Logger) (*Node, error)

func (*Node) EnterLameDuck

func (n *Node) EnterLameDuck() error

func (*Node) IsLameDuck

func (n *Node) IsLameDuck() bool

func (*Node) PublicKey

func (n *Node) PublicKey() (*string, error)

func (*Node) Start

func (n *Node) Start()

func (*Node) Stop

func (n *Node) Stop()

type NodeProxy

type NodeProxy struct {
	// contains filtered or unexported fields
}

Use this proxy object with extreme care, as it exposes the private/internal bits of a node instance to callers. It was created only as a way to make writing specs work and should not be used for any other purpose!

func NewNodeProxyWith

func NewNodeProxyWith(node *Node) *NodeProxy

func (*NodeProxy) APIListener

func (n *NodeProxy) APIListener() *ApiListener

func (*NodeProxy) InternalNATS

func (n *NodeProxy) InternalNATS() *server.Server

func (*NodeProxy) InternalNATSConn

func (n *NodeProxy) InternalNATSConn() *nats.Conn

func (*NodeProxy) Log

func (n *NodeProxy) Log() *slog.Logger

func (*NodeProxy) NodeConfiguration

func (n *NodeProxy) NodeConfiguration() *models.NodeConfiguration

func (*NodeProxy) Telemetry

func (n *NodeProxy) Telemetry() *observability.Telemetry

func (*NodeProxy) WorkloadManager

func (n *NodeProxy) WorkloadManager() *WorkloadManager

type WorkloadManager

type WorkloadManager struct {
	// contains filtered or unexported fields
}

The workload manager provides the high level strategy for the Nex node's workload management. It is responsible for using a process manager interface to manage processes and maintaining agent clients that communicate with those processes. The workload manager does not know how the agent processes are created, only how to communicate with them via the internal NATS server

func NewWorkloadManager

func NewWorkloadManager(
	ctx context.Context,
	cancel context.CancelFunc,
	nodeKeypair nkeys.KeyPair,
	publicKey string,
	nc, ncint *nats.Conn,
	config *models.NodeConfiguration,
	log *slog.Logger,
	telemetry *observability.Telemetry,
) (*WorkloadManager, error)

Initialize a new workload manager instance to manage and communicate with agents

func (*WorkloadManager) CacheWorkload

func (m *WorkloadManager) CacheWorkload(request *controlapi.DeployRequest) (uint64, *string, error)

func (*WorkloadManager) DeployWorkload

func (w *WorkloadManager) DeployWorkload(request *agentapi.DeployRequest) (*string, error)

Deploy a workload as specified by the given deploy request to an available agent in the configured pool

func (*WorkloadManager) LookupWorkload

func (w *WorkloadManager) LookupWorkload(workloadID string) (*agentapi.DeployRequest, error)

Locates a given workload by its workload ID and returns the deployment request associated with it Note that this means "pending" workloads are not considered by lookups

func (*WorkloadManager) OnProcessStarted

func (w *WorkloadManager) OnProcessStarted(id string)

Called by the agent process manager when an agent has been warmed and is ready to receive workload deployment instructions

func (*WorkloadManager) RunningWorkloads

func (w *WorkloadManager) RunningWorkloads() ([]controlapi.MachineSummary, error)

Retrieve a list of deployed, running workloads

func (*WorkloadManager) Start

func (w *WorkloadManager) Start()

Start the workload manager, which in turn starts the configured agent process manager

func (*WorkloadManager) Stop

func (w *WorkloadManager) Stop() error

Stop the workload manager, which will in turn stop all managed agents and attempt to clean up all applicable resources.

func (*WorkloadManager) StopWorkload

func (w *WorkloadManager) StopWorkload(id string, undeploy bool) error

Stop a workload, optionally attempting a graceful undeploy prior to termination

type WorkloadManagerProxy

type WorkloadManagerProxy struct {
	// contains filtered or unexported fields
}

func NewWorkloadManagerProxyWith

func NewWorkloadManagerProxyWith(manager *WorkloadManager) *WorkloadManagerProxy

func (*WorkloadManagerProxy) Agents

func (*WorkloadManagerProxy) AllAgents

func (*WorkloadManagerProxy) InternalNATSConn

func (m *WorkloadManagerProxy) InternalNATSConn() *nats.Conn

func (*WorkloadManagerProxy) Log

func (m *WorkloadManagerProxy) Log() *slog.Logger

func (*WorkloadManagerProxy) NodeConfiguration

func (m *WorkloadManagerProxy) NodeConfiguration() *models.NodeConfiguration

func (*WorkloadManagerProxy) PoolAgents

func (*WorkloadManagerProxy) Telemetry

Directories

Path Synopsis
lib

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL