federation

package
v0.0.52 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 8, 2025 License: Apache-2.0 Imports: 13 Imported by: 0

Documentation

Overview

Package federation provides multi-cluster client management for the MCP Kubernetes server.

This package enables the MCP server to operate across multiple Kubernetes clusters in a federated environment, specifically designed for Giant Swarm's Management Cluster and Workload Cluster architecture using Cluster API (CAPI).

Architecture Overview

The federation package implements a "Hub-and-Spoke" model where:

  • The Management Cluster (MC) acts as the central hub containing CAPI resources
  • Workload Clusters (WC) are dynamically discovered and accessed through kubeconfig secrets
  • All operations are executed under the authenticated user's identity via Kubernetes impersonation

Core Components

ClusterClientManager is the primary interface for multi-cluster operations:

manager := federation.NewManager(localClient, localDynamic, logger)

// Get a client for a specific workload cluster
client, err := manager.GetClient(ctx, "my-cluster", userInfo)

// List all available clusters
clusters, err := manager.ListClusters(ctx, userInfo)

Security Model

The federation package enforces security through:

  • User impersonation: All cluster operations use Impersonate-User headers
  • RBAC delegation: Authorization is delegated to each cluster's RBAC policies
  • No credential exposure: Admin credentials are only used for TLS establishment

Thread Safety

All operations in this package are thread-safe. The ClusterClientManager uses internal synchronization to handle concurrent access from multiple tool handlers.

Error Handling

The package defines specific error types for common failure scenarios:

  • ErrClusterNotFound: The requested cluster doesn't exist or is inaccessible
  • ErrKubeconfigSecretNotFound: CAPI kubeconfig secret is missing
  • ErrKubeconfigInvalid: Secret contains malformed kubeconfig data
  • ErrConnectionFailed: Network or TLS issues connecting to the cluster

Example Usage

// Initialize the manager with the Management Cluster client
manager, err := federation.NewManager(localClient, localDynamic, logger)
if err != nil {
	return err
}
defer manager.Close()

// Get user info from OAuth token
userInfo := &federation.UserInfo{
	Email:  "user@example.com",
	Groups: []string{"developers"},
}

// Get a client for a workload cluster with user impersonation
client, err := manager.GetClient(ctx, "production-cluster", userInfo)
if err != nil {
	return fmt.Errorf("failed to get cluster client: %w", err)
}

// Use the client for Kubernetes operations
pods, err := client.CoreV1().Pods("default").List(ctx, metav1.ListOptions{})

Integration with MCP Server

The federation package is designed to integrate with the ServerContext pattern:

serverCtx, err := server.NewServerContext(ctx,
	server.WithK8sClient(k8sClient),
	server.WithFederationManager(federationManager),
)

Tool handlers can then access the federation manager to perform multi-cluster operations.

Index

Constants

View Source
const (
	// ImpersonateUserHeader is the header name for the impersonated user.
	ImpersonateUserHeader = "Impersonate-User"

	// ImpersonateGroupHeader is the header name for impersonated groups.
	ImpersonateGroupHeader = "Impersonate-Group"

	// ImpersonateExtraHeaderPrefix is the prefix for extra impersonation headers.
	ImpersonateExtraHeaderPrefix = "Impersonate-Extra-"
)

ImpersonationHeaders contains the header names used for Kubernetes user impersonation.

View Source
const (
	// MaxEmailLength is the maximum allowed length for an email address.
	MaxEmailLength = 254

	// MaxGroupNameLength is the maximum allowed length for a group name.
	MaxGroupNameLength = 256

	// MaxGroupCount is the maximum number of groups allowed per user.
	MaxGroupCount = 100

	// MaxExtraKeyLength is the maximum allowed length for an extra header key.
	MaxExtraKeyLength = 256

	// MaxExtraValueLength is the maximum allowed length for an extra header value.
	MaxExtraValueLength = 1024

	// MaxExtraCount is the maximum number of extra headers allowed.
	MaxExtraCount = 50

	// MaxClusterNameLength is the maximum allowed length for a cluster name.
	// Kubernetes names are limited to 253 characters.
	MaxClusterNameLength = 253
)

Validation constants for security limits.

View Source
const CAPISecretKey = "value"

CAPISecretKey is the key within the kubeconfig secret that contains the actual kubeconfig YAML data.

View Source
const CAPISecretSuffix = "-kubeconfig"

CAPISecretSuffix is the suffix used by CAPI for kubeconfig secrets. The full secret name is: ${CLUSTER_NAME}-kubeconfig

Variables

View Source
var (
	// ErrClusterNotFound indicates that the requested cluster does not exist
	// or the user does not have permission to access it.
	ErrClusterNotFound = errors.New("cluster not found")

	// ErrKubeconfigSecretNotFound indicates that the CAPI kubeconfig secret
	// for the cluster is missing from the Management Cluster.
	ErrKubeconfigSecretNotFound = errors.New("kubeconfig secret not found")

	// ErrKubeconfigInvalid indicates that the kubeconfig secret contains
	// malformed or unparseable kubeconfig data.
	ErrKubeconfigInvalid = errors.New("kubeconfig data is invalid")

	// ErrConnectionFailed indicates a network or TLS error when attempting
	// to connect to the target cluster.
	ErrConnectionFailed = errors.New("failed to connect to cluster")

	// ErrImpersonationFailed indicates that the user impersonation could not
	// be configured on the cluster client.
	ErrImpersonationFailed = errors.New("failed to configure user impersonation")

	// ErrManagerClosed indicates that the ClusterClientManager has been closed
	// and can no longer be used.
	ErrManagerClosed = errors.New("federation manager is closed")
)

Sentinel errors for common federation failure scenarios. These errors can be checked using errors.Is() for programmatic error handling.

View Source
var (
	// ErrUserInfoRequired indicates that user information is required but was not provided.
	ErrUserInfoRequired = fmt.Errorf("user information is required for cluster operations")

	// ErrInvalidEmail indicates that the email address format is invalid.
	ErrInvalidEmail = fmt.Errorf("invalid email address format")

	// ErrInvalidGroupName indicates that a group name is invalid.
	ErrInvalidGroupName = fmt.Errorf("invalid group name")

	// ErrInvalidExtraHeader indicates that an extra header key or value is invalid.
	ErrInvalidExtraHeader = fmt.Errorf("invalid extra header")

	// ErrInvalidClusterName indicates that a cluster name is invalid.
	ErrInvalidClusterName = fmt.Errorf("invalid cluster name")
)

Validation errors.

Functions

func AnonymizeEmail

func AnonymizeEmail(email string) string

AnonymizeEmail returns a hashed representation of an email for logging purposes. This allows correlation of log entries without exposing PII.

func AnonymizeUserInfo

func AnonymizeUserInfo(user *UserInfo) map[string]interface{}

AnonymizeUserInfo returns anonymized user identifiers for logging. Returns a map with "user_hash" and "group_count" for safe logging.

func ValidateClusterName

func ValidateClusterName(name string) error

ValidateClusterName validates a cluster name against Kubernetes naming conventions.

func ValidateUserInfo

func ValidateUserInfo(user *UserInfo) error

ValidateUserInfo validates the UserInfo struct for security. Returns ErrUserInfoRequired if user is nil. Returns a ValidationError if any field fails validation.

Types

type ClusterClientManager

type ClusterClientManager interface {
	// GetClient returns a Kubernetes client for the target cluster,
	// configured to impersonate the provided user.
	// If clusterName is empty, returns the local (Management Cluster) client.
	//
	// The returned client has Impersonate-User and Impersonate-Group headers
	// configured based on the UserInfo, ensuring all operations are executed
	// under the authenticated user's identity.
	GetClient(ctx context.Context, clusterName string, user *UserInfo) (kubernetes.Interface, error)

	// GetDynamicClient returns a dynamic client for the target cluster,
	// useful for working with CRDs like CAPI resources.
	// If clusterName is empty, returns the local (Management Cluster) dynamic client.
	//
	// Like GetClient, the returned client is configured for user impersonation.
	GetDynamicClient(ctx context.Context, clusterName string, user *UserInfo) (dynamic.Interface, error)

	// ListClusters returns a list of available workload clusters.
	// The list is filtered based on the user's RBAC permissions - only clusters
	// the user has access to view will be returned.
	//
	// This method queries CAPI Cluster resources on the Management Cluster.
	ListClusters(ctx context.Context, user *UserInfo) ([]ClusterSummary, error)

	// GetClusterSummary returns detailed information about a specific cluster.
	// Returns ErrClusterNotFound if the cluster doesn't exist or the user
	// doesn't have permission to access it.
	GetClusterSummary(ctx context.Context, clusterName string, user *UserInfo) (*ClusterSummary, error)

	// Close releases all cached clients and resources.
	// After Close is called, all other methods will return ErrManagerClosed.
	Close() error
}

ClusterClientManager manages Kubernetes clients for multi-cluster operations. It retrieves clients for both the local Management Cluster and remote Workload Clusters, with support for user impersonation.

All methods are thread-safe and can be called concurrently from multiple tool handlers.

type ClusterNotFoundError

type ClusterNotFoundError struct {
	ClusterName string
	Namespace   string
	Reason      string
}

ClusterNotFoundError provides detailed context about a cluster lookup failure.

func (*ClusterNotFoundError) Error

func (e *ClusterNotFoundError) Error() string

Error implements the error interface.

func (*ClusterNotFoundError) Unwrap

func (e *ClusterNotFoundError) Unwrap() error

Unwrap returns the underlying sentinel error for use with errors.Is().

func (*ClusterNotFoundError) UserFacingError

func (e *ClusterNotFoundError) UserFacingError() string

UserFacingError returns a sanitized error message safe for end users. This prevents leaking internal cluster names and namespace structure.

type ClusterPhase

type ClusterPhase string

ClusterPhase represents the lifecycle phase of a CAPI cluster.

const (
	// ClusterPhasePending indicates the cluster is awaiting provisioning.
	ClusterPhasePending ClusterPhase = "Pending"

	// ClusterPhaseProvisioning indicates the cluster is being created.
	ClusterPhaseProvisioning ClusterPhase = "Provisioning"

	// ClusterPhaseProvisioned indicates the cluster is fully operational.
	ClusterPhaseProvisioned ClusterPhase = "Provisioned"

	// ClusterPhaseDeleting indicates the cluster is being deleted.
	ClusterPhaseDeleting ClusterPhase = "Deleting"

	// ClusterPhaseFailed indicates the cluster encountered a fatal error.
	ClusterPhaseFailed ClusterPhase = "Failed"

	// ClusterPhaseUnknown indicates the cluster phase cannot be determined.
	ClusterPhaseUnknown ClusterPhase = "Unknown"
)

Standard CAPI cluster phases.

type ClusterSummary

type ClusterSummary struct {
	// Name is the unique identifier of the cluster within its namespace.
	// This corresponds to the Cluster API Cluster resource name.
	Name string `json:"name"`

	// Namespace is the organization namespace on the Management Cluster
	// where the CAPI Cluster resource is located.
	Namespace string `json:"namespace"`

	// Provider indicates the infrastructure provider (e.g., "aws", "azure", "vsphere").
	// This is extracted from the CAPI infrastructure reference.
	Provider string `json:"provider,omitempty"`

	// Release is the Giant Swarm release version running on the cluster.
	// Format follows semver, e.g., "19.3.0".
	Release string `json:"release,omitempty"`

	// KubernetesVersion is the Kubernetes version running on the cluster.
	// Format follows semver, e.g., "1.28.5".
	KubernetesVersion string `json:"kubernetesVersion,omitempty"`

	// Status indicates the current lifecycle phase of the cluster.
	// Common values: "Provisioned", "Provisioning", "Deleting", "Failed".
	Status string `json:"status"`

	// Ready indicates whether the cluster is fully operational and
	// ready to accept workloads.
	Ready bool `json:"ready"`

	// ControlPlaneReady indicates whether the control plane components
	// are healthy and operational.
	ControlPlaneReady bool `json:"controlPlaneReady"`

	// InfrastructureReady indicates whether the underlying infrastructure
	// (VMs, networks, etc.) is provisioned and healthy.
	InfrastructureReady bool `json:"infrastructureReady"`

	// NodeCount is the current number of worker nodes in the cluster.
	// This may differ from the desired count during scaling operations.
	NodeCount int `json:"nodeCount,omitempty"`

	// CreatedAt is the timestamp when the cluster was initially created.
	CreatedAt time.Time `json:"createdAt"`

	// Labels contains the Kubernetes labels applied to the Cluster resource.
	// These often include organization, team, or environment tags.
	Labels map[string]string `json:"labels,omitempty"`

	// Annotations contains the Kubernetes annotations on the Cluster resource.
	// May include operational metadata or external references.
	Annotations map[string]string `json:"annotations,omitempty"`
}

ClusterSummary provides basic information about a workload cluster. This is returned by ListClusters and contains metadata useful for cluster selection and display purposes.

type ConnectionError

type ConnectionError struct {
	ClusterName string
	Host        string
	Reason      string
	Err         error
}

ConnectionError provides detailed context about cluster connection failures.

func (*ConnectionError) Error

func (e *ConnectionError) Error() string

Error implements the error interface.

func (*ConnectionError) Unwrap

func (e *ConnectionError) Unwrap() error

Unwrap returns the underlying error for use with errors.Is() and errors.As().

func (*ConnectionError) UserFacingError

func (e *ConnectionError) UserFacingError() string

UserFacingError returns a sanitized error message safe for end users. This prevents leaking internal host URLs and network topology.

type KubeconfigError

type KubeconfigError struct {
	ClusterName string
	SecretName  string
	Namespace   string
	Reason      string
	Err         error
	// NotFound indicates the kubeconfig secret was not found (vs other errors like invalid data).
	// When true, Unwrap() returns ErrKubeconfigSecretNotFound instead of ErrKubeconfigInvalid.
	NotFound bool
}

KubeconfigError provides detailed context about kubeconfig retrieval failures.

func (*KubeconfigError) Error

func (e *KubeconfigError) Error() string

Error implements the error interface.

func (*KubeconfigError) Unwrap

func (e *KubeconfigError) Unwrap() error

Unwrap returns the underlying error for use with errors.Is() and errors.As().

func (*KubeconfigError) UserFacingError

func (e *KubeconfigError) UserFacingError() string

UserFacingError returns a sanitized error message safe for end users. This prevents leaking internal secret names and namespace structure.

type Manager

type Manager struct {
	// contains filtered or unexported fields
}

Manager implements ClusterClientManager for CAPI-based multi-cluster federation.

func NewManager

func NewManager(localClient kubernetes.Interface, localDynamic dynamic.Interface, logger *slog.Logger) (*Manager, error)

NewManager creates a new ClusterClientManager with the provided local clients.

Parameters:

  • localClient: Kubernetes clientset for the Management Cluster
  • localDynamic: Dynamic client for the Management Cluster (for CAPI CRDs)
  • logger: Structured logger for operational messages (can be nil)

The local clients should be configured with admin credentials for the Management Cluster. These credentials are only used to:

  • Read CAPI Cluster resources for discovery
  • Read kubeconfig Secrets for Workload Cluster access
  • Establish TLS connections to Workload Clusters

All actual operations are executed under user impersonation.

func (*Manager) Close

func (m *Manager) Close() error

Close releases all cached clients and resources.

func (*Manager) GetClient

func (m *Manager) GetClient(ctx context.Context, clusterName string, user *UserInfo) (kubernetes.Interface, error)

GetClient returns a Kubernetes client for the target cluster. Returns ErrUserInfoRequired if user is nil (to prevent privilege escalation). Returns ErrInvalidClusterName if the cluster name fails validation.

func (*Manager) GetClusterSummary

func (m *Manager) GetClusterSummary(ctx context.Context, clusterName string, user *UserInfo) (*ClusterSummary, error)

GetClusterSummary returns information about a specific cluster. Returns ErrUserInfoRequired if user is nil (to prevent privilege escalation). Returns ErrInvalidClusterName if the cluster name fails validation.

func (*Manager) GetDynamicClient

func (m *Manager) GetDynamicClient(ctx context.Context, clusterName string, user *UserInfo) (dynamic.Interface, error)

GetDynamicClient returns a dynamic client for the target cluster. Returns ErrUserInfoRequired if user is nil (to prevent privilege escalation). Returns ErrInvalidClusterName if the cluster name fails validation.

func (*Manager) ListClusters

func (m *Manager) ListClusters(ctx context.Context, user *UserInfo) ([]ClusterSummary, error)

ListClusters returns all available workload clusters. Returns ErrUserInfoRequired if user is nil (to prevent privilege escalation).

type UserInfo

type UserInfo struct {
	// Email is the user's email address from the OAuth token's email claim.
	// This is used as the Impersonate-User header value.
	Email string

	// Groups contains the user's group memberships from OAuth claims.
	// These are passed via Impersonate-Group headers for RBAC evaluation.
	Groups []string

	// Extra contains additional claims from the OAuth token that should be
	// propagated to the Kubernetes API server via Impersonate-Extra headers.
	// Common examples include organization IDs, tenant identifiers, or custom claims.
	Extra map[string][]string
}

UserInfo contains the authenticated user's identity information extracted from the OAuth token. This information is used to configure Kubernetes user impersonation headers.

type ValidationError

type ValidationError struct {
	Field  string
	Value  string // Sanitized value (may be truncated or anonymized)
	Reason string
	Err    error
}

ValidationError provides detailed context about a validation failure.

func (*ValidationError) Error

func (e *ValidationError) Error() string

Error implements the error interface.

func (*ValidationError) Unwrap

func (e *ValidationError) Unwrap() error

Unwrap returns the underlying error for use with errors.Is() and errors.As().

func (*ValidationError) UserFacingError

func (e *ValidationError) UserFacingError() string

UserFacingError returns a sanitized error message safe for end users.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL