autostandby

package
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 12, 2026 License: MIT Imports: 20 Imported by: 0

README

Auto Standby

This feature automatically puts a Linux VM into Standby after it has stopped serving inbound TCP traffic for a configured amount of time.

What counts as activity

The feature looks at host-side conntrack state, not ingress configuration and not TAP byte counters.

A VM is considered active when there is at least one tracked TCP flow where:

  • the original destination is the VM's private IP
  • the VM is the server/responding side of the connection
  • the flow is currently tracked as live by conntrack

That means:

  • inbound client connections keep the VM awake
  • replies to outbound guest requests do not keep the VM awake
  • same-host clients count by default

Idle behavior

Hypeman seeds its controller from a conntrack snapshot on startup, then keeps state current with conntrack netlink events.

  • new inbound TCP flows are tracked from conntrack NEW events
  • TCP teardown is treated as inactivity once conntrack reports a terminal state or the flow disappears
  • connections that were already open when Hypeman started are reconciled against fresh conntrack snapshots until they drain, so restart-seeded traffic can still age out correctly
  • Hypeman also performs a full snapshot sync every 5 minutes by default as a low-frequency consistency check; the controller interval is configurable

When the active inbound TCP connection count reaches zero, Hypeman starts an idle timer for that instance.

  • if a new inbound TCP connection appears before the timer expires, the timer is cleared
  • if the count stays at zero for the full idle_timeout, Hypeman places the VM into Standby

The idle timestamps are also persisted in instance metadata.

  • if Hypeman restarts and a startup conntrack snapshot shows current inbound connections, the instance is treated as active immediately and any old idle countdown is cleared
  • if Hypeman restarts and the snapshot shows zero current inbound connections, Hypeman resumes the persisted idle countdown

This keeps the restart behavior conservative about current traffic while still allowing long idle windows to carry across control-plane restarts.

Exclusions

Instances can ignore some traffic when deciding whether they are active:

  • ignore_source_cidrs excludes matching client source ranges
  • ignore_destination_ports excludes matching VM destination ports

This is intended for probes, internal callers, or ports that should not keep a VM warm.

Limits

  • Linux only
  • TCP only
  • IPv4 conntrack only
  • Wake-on-traffic is not part of this feature

Status endpoint

Hypeman exposes a diagnostic status endpoint for each instance that reports:

  • whether auto-standby is supported, configured, enabled, and currently eligible
  • how many qualifying inbound TCP connections are currently keeping the VM awake
  • the current idle timer timestamps and next planned standby time
  • the current controller reason, such as active inbound traffic, countdown still running, or observer failure

Wake-on-traffic would require a separate host-owned listener or forwarding layer that can accept a connection while the VM is asleep, trigger restore, and then hand traffic through once the VM is running.

Documentation

Index

Constants

View Source
const (
	StateRunning = "Running"
)

Variables

This section is empty.

Functions

func ActiveInboundCount

func ActiveInboundCount(inst Instance, conns []Connection) (int, time.Duration, error)

ActiveInboundCount returns the number of active inbound TCP connections for an instance and the compiled idle timeout that should be applied to it.

Types

type Connection

type Connection struct {
	OriginalSourceIP        netip.Addr
	OriginalSourcePort      uint16
	OriginalDestinationIP   netip.Addr
	OriginalDestinationPort uint16
	TCPState                TCPState
}

Connection is the normalized network view used by activity classification.

type ConnectionEvent

type ConnectionEvent struct {
	Type       ConnectionEventType
	Connection Connection
	ObservedAt time.Time
}

ConnectionEvent is a single conntrack event delivered from the host observer.

type ConnectionEventType

type ConnectionEventType string
const (
	ConnectionEventNew     ConnectionEventType = "new"
	ConnectionEventDestroy ConnectionEventType = "destroy"
)

type ConnectionKey

type ConnectionKey struct {
	OriginalSourceIP        netip.Addr
	OriginalSourcePort      uint16
	OriginalDestinationIP   netip.Addr
	OriginalDestinationPort uint16
}

type ConnectionSource

type ConnectionSource interface {
	ListConnections(ctx context.Context) ([]Connection, error)
	OpenStream(ctx context.Context) (ConnectionStream, error)
}

ConnectionSource provides startup snapshots and live conntrack events.

type ConnectionStream

type ConnectionStream interface {
	Events() <-chan ConnectionEvent
	Errors() <-chan error
	Close() error
}

ConnectionStream is a live conntrack event stream.

type ConntrackSource

type ConntrackSource struct {
	// contains filtered or unexported fields
}

ConntrackSource reads current IPv4 TCP conntrack entries from the host.

func NewConntrackSource

func NewConntrackSource() *ConntrackSource

NewConntrackSource creates a conntrack-backed connection source.

func (*ConntrackSource) ListConnections

func (s *ConntrackSource) ListConnections(context.Context) ([]Connection, error)

ListConnections returns normalized TCP flows from the host conntrack table.

func (*ConntrackSource) OpenStream

func (s *ConntrackSource) OpenStream(ctx context.Context) (ConnectionStream, error)

OpenStream subscribes to IPv4 conntrack NEW, UPDATE, and DESTROY events.

type Controller

type Controller struct {
	// contains filtered or unexported fields
}

Controller decides when eligible instances should transition to standby.

func NewController

func NewController(store InstanceStore, source ConnectionSource, opts ControllerOptions) *Controller

NewController creates a new event-driven auto-standby controller.

func (*Controller) Describe

func (c *Controller) Describe(inst Instance) StatusSnapshot

Describe returns the current diagnostic view for an instance.

func (*Controller) Run

func (c *Controller) Run(ctx context.Context) error

Run starts the controller and blocks until ctx is cancelled.

type ControllerOptions

type ControllerOptions struct {
	Log                  *slog.Logger
	Meter                metric.Meter
	Tracer               trace.Tracer
	Now                  func() time.Time
	ReconnectDelay       time.Duration
	ReconcileDelay       time.Duration
	SnapshotSyncInterval time.Duration
}

ControllerOptions configures logging, timing, and observability.

type Instance

type Instance struct {
	ID             string
	Name           string
	State          string
	NetworkEnabled bool
	IP             string
	HasVGPU        bool
	AutoStandby    *Policy
	Runtime        *Runtime
}

Instance is the minimal instance view needed by the auto-standby controller.

type InstanceEvent

type InstanceEvent struct {
	Action     InstanceEventAction
	InstanceID string
	Instance   *Instance
}

InstanceEvent carries an instance lifecycle update into the controller.

type InstanceEventAction

type InstanceEventAction string

InstanceEventAction identifies an instance lifecycle change relevant to auto-standby.

const (
	InstanceEventCreate  InstanceEventAction = "create"
	InstanceEventUpdate  InstanceEventAction = "update"
	InstanceEventStart   InstanceEventAction = "start"
	InstanceEventStop    InstanceEventAction = "stop"
	InstanceEventStandby InstanceEventAction = "standby"
	InstanceEventRestore InstanceEventAction = "restore"
	InstanceEventDelete  InstanceEventAction = "delete"
	InstanceEventFork    InstanceEventAction = "fork"
)

type InstanceStore

type InstanceStore interface {
	ListInstances(ctx context.Context) ([]Instance, error)
	StandbyInstance(ctx context.Context, id string) error
	SetRuntime(ctx context.Context, id string, runtime *Runtime) error
	SubscribeInstanceEvents() (<-chan InstanceEvent, func(), error)
}

InstanceStore supplies the controller with instance state, lifecycle events, runtime persistence, and standby actions.

type Metrics

type Metrics struct {
	// contains filtered or unexported fields
}

type Policy

type Policy struct {
	Enabled                bool     `json:"enabled"`
	IdleTimeout            string   `json:"idle_timeout,omitempty"`
	IgnoreSourceCIDRs      []string `json:"ignore_source_cidrs,omitempty"`
	IgnoreDestinationPorts []uint16 `json:"ignore_destination_ports,omitempty"`
}

Policy configures per-instance automatic standby behavior.

func NormalizePolicy

func NormalizePolicy(policy *Policy) (*Policy, error)

NormalizePolicy validates and canonicalizes a policy for storage.

type Reason

type Reason string
const (
	ReasonUnsupportedPlatform   Reason = "unsupported_platform"
	ReasonPolicyMissing         Reason = "policy_missing"
	ReasonPolicyDisabled        Reason = "policy_disabled"
	ReasonInstanceNotRunning    Reason = "instance_not_running"
	ReasonNetworkDisabled       Reason = "network_disabled"
	ReasonMissingIP             Reason = "missing_ip"
	ReasonHasVGPU               Reason = "has_vgpu"
	ReasonActiveInbound         Reason = "active_inbound_connections"
	ReasonIdleTimeoutNotElapsed Reason = "idle_timeout_not_elapsed"
	ReasonObserverError         Reason = "observer_error"
	ReasonReadyForStandby       Reason = "ready_for_standby"
)

type Runtime

type Runtime struct {
	IdleSince             *time.Time `json:"idle_since,omitempty"`
	LastInboundActivityAt *time.Time `json:"last_inbound_activity_at,omitempty"`
}

Runtime stores persisted and in-memory idle-tracking timestamps.

type Status

type Status string
const (
	StatusUnsupported      Status = "unsupported"
	StatusDisabled         Status = "disabled"
	StatusIneligible       Status = "ineligible"
	StatusActive           Status = "active"
	StatusIdleCountdown    Status = "idle_countdown"
	StatusReadyForStandby  Status = "ready_for_standby"
	StatusStandbyRequested Status = "standby_requested"
	StatusError            Status = "error"
)

type StatusSnapshot

type StatusSnapshot struct {
	Supported             bool
	Configured            bool
	Enabled               bool
	Eligible              bool
	Status                Status
	Reason                Reason
	ActiveInboundCount    int
	IdleTimeout           string
	IdleSince             *time.Time
	LastInboundActivityAt *time.Time
	NextStandbyAt         *time.Time
	CountdownRemaining    *time.Duration
	TrackingMode          string
}

StatusSnapshot is a diagnostic view of the controller's current state for one VM.

type TCPState

type TCPState uint8

TCPState is the conntrack TCP state for a flow.

const (
	TCPStateNone        TCPState = 0
	TCPStateSynSent     TCPState = 1
	TCPStateSynRecv     TCPState = 2
	TCPStateEstablished TCPState = 3
	TCPStateFinWait     TCPState = 4
	TCPStateCloseWait   TCPState = 5
	TCPStateLastAck     TCPState = 6
	TCPStateTimeWait    TCPState = 7
	TCPStateClose       TCPState = 8
	TCPStateListen      TCPState = 9
	TCPStateIgnore      TCPState = 10
	TCPStateRetrans     TCPState = 11
)

func (TCPState) Active

func (s TCPState) Active() bool

Active reports whether the TCP state should keep a VM awake.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL