federation

package

v0.0.78 Latest Latest Go to latest Published: Dec 9, 2025 License: Apache-2.0 Imports: 27 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/giantswarm/mcp-kubernetes

Links

Open Source Insights

Documentation ¶

Overview ¶

Package federation provides multi-cluster client management for the MCP Kubernetes server.

This package enables the MCP server to operate across multiple Kubernetes clusters in a federated environment, specifically designed for Giant Swarm's Management Cluster and Workload Cluster architecture using Cluster API (CAPI).

Architecture Overview ¶

The federation package implements a "Hub-and-Spoke" model where:

The Management Cluster (MC) acts as the central hub containing CAPI resources
Workload Clusters (WC) are dynamically discovered and accessed through kubeconfig secrets
All operations are executed under the authenticated user's identity

Core Components ¶

ClusterClientManager is the primary interface for multi-cluster operations:

// Create an OAuthClientProvider for OAuth downstream mode
oauthProvider, err := federation.NewOAuthClientProviderFromInCluster()
if err != nil {
	return err
}

// Configure the token extractor to get OAuth tokens from context
oauthProvider.SetTokenExtractor(oauth.GetAccessTokenFromContext)

manager, err := federation.NewManager(oauthProvider,
	federation.WithManagerLogger(logger),
)
if err != nil {
	return err
}
defer manager.Close()

// Get a client for a specific workload cluster
client, err := manager.GetClient(ctx, "my-cluster", userInfo)

// List all available clusters
clusters, err := manager.ListClusters(ctx, userInfo)

Security Model ¶

The federation package enforces security through defense in depth:

ClientProvider: Creates per-user Kubernetes clients for Management Cluster access. When OAuth downstream is enabled, each user's OAuth token is used for authentication.
MC RBAC Enforcement: Users must have permission to list CAPI Cluster resources and read kubeconfig secrets on the Management Cluster.
WC RBAC Enforcement: Operations on Workload Clusters use impersonation headers, delegating authorization to each cluster's RBAC policies.
User Isolation in Caching: Cached clients are keyed by (cluster, user) pairs.

This two-layer security model ensures that users can only access clusters they have permission to see on the MC, AND they can only perform operations allowed by their identity on each WC.

Client Caching ¶

The package implements a thread-safe client cache with TTL-based eviction to optimize performance. Key security properties:

User Isolation: Each cached client is keyed by both cluster name AND user email, ensuring user A can never retrieve a client configured for user B.
TTL Expiration: Cached clients expire after a configurable TTL (default: 10 minutes). Set this to be less than or equal to your OAuth token lifetime.
Manual Invalidation: Use DeleteByCluster() when cluster credentials are rotated, or implement token revocation callbacks to remove specific user entries.
PII Protection: User emails are anonymized in logs using SHA-256 hashing.

Metrics ¶

The cache exposes the following metrics for monitoring:

mcp_client_cache_hits_total: Cache hit count (by cluster)
mcp_client_cache_misses_total: Cache miss count (by cluster)
mcp_client_cache_evictions_total: Eviction count (by reason: expired, lru, manual)
mcp_client_cache_entries: Current cache size (gauge)

Note: The "cluster" label on hit/miss metrics may have high cardinality in environments with many clusters. Monitor your metrics backend capacity.

Thread Safety ¶

All operations in this package are thread-safe. The ClusterClientManager uses internal synchronization to handle concurrent access from multiple tool handlers.

User Impersonation ¶

The package implements Kubernetes User Impersonation to propagate authenticated user identity to Workload Clusters. Instead of executing operations with admin credentials, all API calls include impersonation headers that cause the Kubernetes API server to evaluate RBAC policies for the authenticated user.

The impersonation configuration sets the following headers:

Impersonate-User: <user-email>           (e.g., "jane@giantswarm.io")
Impersonate-Group: <group-1>             (e.g., "github:org:giantswarm")
Impersonate-Group: <group-2>             (e.g., "platform-team")
Impersonate-Extra-agent: mcp-kubernetes  (audit trail identifier)
Impersonate-Extra-sub: <subject-id>      (OAuth subject claim)

The "agent: mcp-kubernetes" extra header is automatically added to all impersonated requests. This enables audit log correlation to identify operations performed via the MCP server, distinguishing them from direct kubectl access.

Group Mapping Behavior ¶

OAuth groups are passed through to Kubernetes impersonation headers WITHOUT transformation. This ensures consistency between MCP-mediated access and direct kubectl access with the same identity. Common group formats:

GitHub: "github:org:myorg", "github:team:platform"
Azure AD: "azure:group:abc123-def456"
LDAP: "ldap:group:cn=admins,dc=example,dc=com"
System: "system:authenticated", "system:masters"

Administrators should configure Workload Cluster RBAC policies to match the exact group strings provided by their identity provider through Dex.

OAuth Provider Trust Boundary ¶

The OAuth provider (e.g., Dex with GitHub/Azure AD/LDAP connectors) is a critical trust boundary in this architecture. The MCP server trusts the OAuth provider to:

Accurately identify users (email claim)
Correctly enumerate group memberships (groups claim)
Not return privileged system groups unless the user is actually a member

Security implications:

If an OAuth provider is compromised and returns false group claims (e.g., "system:masters"), users could gain unintended cluster-admin privileges.
This is consistent with direct kubectl access: the same risk exists when users authenticate directly to clusters via OIDC.
Defense: Configure your OAuth provider with appropriate access controls, audit logs, and avoid mapping external groups directly to "system:masters".

The agent header ("Impersonate-Extra-agent: mcp-kubernetes") is immutable and cannot be overridden by user-supplied OAuth claims. This ensures the audit trail always correctly identifies MCP-mediated access, even if other claims are manipulated.

RBAC Requirements for Impersonation ¶

For impersonation to work, the admin credentials in the CAPI kubeconfig secret must have permission to impersonate users and groups on the Workload Cluster:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: impersonate-all
rules:
  - apiGroups: [""]
    resources: ["users", "groups", "serviceaccounts"]
    verbs: ["impersonate"]
  - apiGroups: ["authentication.k8s.io"]
    resources: ["userextras/agent"]
    verbs: ["impersonate"]

CAPI-generated admin credentials typically have these permissions by default.

Pre-flight Access Checks ¶

The package provides SubjectAccessReview (SAR) pre-flight checks to verify user permissions before attempting operations:

// Check if user can delete pods before attempting
result, err := manager.CheckAccess(ctx, "prod-cluster", user, &AccessCheck{
	Verb:      "delete",
	Resource:  "pods",
	Namespace: "production",
})
if err != nil {
	return err // Check failed (API error, etc.)
}
if !result.Allowed {
	return fmt.Errorf("permission denied: %s", result.Reason)
}
// Proceed with delete...

Or use the convenience method that returns an error if denied:

if err := manager.CheckAccessAllowed(ctx, cluster, user, check); err != nil {
	return err // Either check failed or access denied
}

The "can_i" tool (in internal/tools/access) exposes this functionality to AI agents, allowing them to query permissions before attempting operations.

Error Handling ¶

The package defines specific error types for common failure scenarios:

ErrClusterNotFound: The requested cluster doesn't exist or is inaccessible
ErrKubeconfigSecretNotFound: CAPI kubeconfig secret is missing
ErrKubeconfigInvalid: Secret contains malformed kubeconfig data
ErrConnectionFailed: Network or TLS issues connecting to the cluster
ErrImpersonationFailed: User impersonation could not be configured
ErrAccessDenied: User lacks RBAC permissions for the operation
ErrAccessCheckFailed: The SAR check itself failed (not denied, but error)
ErrInvalidAccessCheck: Invalid AccessCheck parameters

All user-facing errors return a generic message ("cluster access denied or unavailable") to prevent information leakage that could enable cluster enumeration attacks.

Example Usage ¶

// Create an OAuthClientProvider for OAuth downstream mode
oauthProvider, err := federation.NewOAuthClientProviderFromInCluster()
if err != nil {
	return err
}

// Configure the token extractor (uses OAuth middleware's context storage)
oauthProvider.SetTokenExtractor(oauth.GetAccessTokenFromContext)

// Initialize the manager with the OAuth provider
manager, err := federation.NewManager(oauthProvider,
	federation.WithManagerLogger(logger),
	federation.WithManagerCacheConfig(federation.CacheConfig{
		TTL:        10 * time.Minute,
		MaxEntries: 1000,
	}),
)
if err != nil {
	return err
}
defer manager.Close()

// Get user info from OAuth token (set by OAuth middleware)
userInfo := &federation.UserInfo{
	Email:  "user@example.com",
	Groups: []string{"developers"},
}

// Get a client for a workload cluster with user impersonation
client, err := manager.GetClient(ctx, "production-cluster", userInfo)
if err != nil {
	return fmt.Errorf("failed to get cluster client: %w", err)
}

// Use the client for Kubernetes operations
pods, err := client.CoreV1().Pods("default").List(ctx, metav1.ListOptions{})

Integration with MCP Server ¶

The federation package is designed to integrate with the ServerContext pattern:

serverCtx, err := server.NewServerContext(ctx,
	server.WithK8sClient(k8sClient),
	server.WithFederationManager(federationManager),
)

Tool handlers can then access the federation manager to perform multi-cluster operations.

Package federation provides OAuth client provider implementation.

Index ¶

Constants
Variables
func AnonymizeEmail(email string) string
func AnonymizeUserInfo(user *UserInfo) map[string]interface{}
func ApplyConnectivityConfig(config *rest.Config, cc ConnectivityConfig)
func CheckConnectivity(ctx context.Context, clusterName string, config *rest.Config, ...) error
func CheckConnectivityWithRetry(ctx context.Context, clusterName string, config *rest.Config, ...) error
func ConfigWithImpersonation(config *rest.Config, user *UserInfo) *rest.Config
func ConfigWithImpersonationAndTraceID(config *rest.Config, user *UserInfo, traceID string) *rest.Config
func GetEndpointType(host string) string
func UserHashAttr(email string) slog.Attr
func ValidateAccessCheck(check *AccessCheck) error
func ValidateClusterName(name string) error
func ValidateUserInfo(user *UserInfo) error
type AccessCheck
type AccessCheckError
- func (e *AccessCheckError) Error() string
- func (e *AccessCheckError) Is(target error) bool
- func (e *AccessCheckError) Unwrap() error
- func (e *AccessCheckError) UserFacingError() string
type AccessCheckResult
type AccessDeniedError
- func (e *AccessDeniedError) Error() string
- func (e *AccessDeniedError) Is(target error) bool
- func (e *AccessDeniedError) Unwrap() error
- func (e *AccessDeniedError) UserFacingError() string
type AmbiguousClusterError
- func (e *AmbiguousClusterError) Error() string
- func (e *AmbiguousClusterError) UserFacingError() string
type CacheConfig
- func DefaultCacheConfig() CacheConfig
type CacheMetricsRecorder
type CacheStats
type ClientCache
- func NewClientCache(opts ...ClientCacheOption) *ClientCache
- func (c *ClientCache) Close() error
- func (c *ClientCache) Delete(ctx context.Context, clusterName, userEmail string)
- func (c *ClientCache) DeleteByCluster(ctx context.Context, clusterName string)
- func (c *ClientCache) Get(ctx context.Context, clusterName, userEmail string) *cachedClient
- func (c *ClientCache) GetOrCreate(ctx context.Context, clusterName, userEmail string, ...) (kubernetes.Interface, dynamic.Interface, error)
- func (c *ClientCache) Set(ctx context.Context, clusterName, userEmail string, ...)
- func (c *ClientCache) Size() int
- func (c *ClientCache) Stats() CacheStats
type ClientCacheOption
- func WithCacheConfig(config CacheConfig) ClientCacheOption
- func WithCacheLogger(logger *slog.Logger) ClientCacheOption
- func WithCacheMetrics(metrics CacheMetricsRecorder) ClientCacheOption
type ClientProvider
type ClusterClientManager
type ClusterDiscoveryError
- func (e *ClusterDiscoveryError) Error() string
- func (e *ClusterDiscoveryError) Is(target error) bool
- func (e *ClusterDiscoveryError) Unwrap() error
- func (e *ClusterDiscoveryError) UserFacingError() string
type ClusterInfo
type ClusterListOptions
type ClusterNotFoundError
- func (e *ClusterNotFoundError) Error() string
- func (e *ClusterNotFoundError) Unwrap() error
- func (e *ClusterNotFoundError) UserFacingError() string
type ClusterPhase
type ClusterSummary
- func (cs *ClusterSummary) ClusterAge() time.Duration
- func (cs *ClusterSummary) Description() string
- func (cs *ClusterSummary) IsGiantSwarmCluster() bool
- func (cs *ClusterSummary) Organization() string
type ConnectionError
- func (e *ConnectionError) Error() string
- func (e *ConnectionError) Is(target error) bool
- func (e *ConnectionError) Unwrap() error
- func (e *ConnectionError) UserFacingError() string
type ConnectivityConfig
- func DefaultConnectivityConfig() ConnectivityConfig
- func HighLatencyConnectivityConfig() ConnectivityConfig
type ConnectivityTimeoutError
- func (e *ConnectivityTimeoutError) Error() string
- func (e *ConnectivityTimeoutError) Is(target error) bool
- func (e *ConnectivityTimeoutError) Unwrap() error
- func (e *ConnectivityTimeoutError) UserFacingError() string
type ImpersonationError
- func (e *ImpersonationError) Error() string
- func (e *ImpersonationError) Is(target error) bool
- func (e *ImpersonationError) Unwrap() error
- func (e *ImpersonationError) UserFacingError() string
type KubeconfigError
- func (e *KubeconfigError) Error() string
- func (e *KubeconfigError) Is(target error) bool
- func (e *KubeconfigError) Unwrap() error
- func (e *KubeconfigError) UserFacingError() string
type Manager
- func NewManager(clientProvider ClientProvider, opts ...ManagerOption) (*Manager, error)
- func (m *Manager) CheckAccess(ctx context.Context, clusterName string, user *UserInfo, check *AccessCheck) (*AccessCheckResult, error)
- func (m *Manager) CheckAccessAllowed(ctx context.Context, clusterName string, user *UserInfo, check *AccessCheck) error
- func (m *Manager) CheckClusterConnectivity(ctx context.Context, clusterName string, user *UserInfo) error
- func (m *Manager) Close() error
- func (m *Manager) GetClient(ctx context.Context, clusterName string, user *UserInfo) (kubernetes.Interface, error)
- func (m *Manager) GetClusterSummary(ctx context.Context, clusterName string, user *UserInfo) (*ClusterSummary, error)
- func (m *Manager) GetDynamicClient(ctx context.Context, clusterName string, user *UserInfo) (dynamic.Interface, error)
- func (m *Manager) GetKubeconfigForCluster(ctx context.Context, clusterName string, user *UserInfo) (*rest.Config, error)
- func (m *Manager) GetKubeconfigForClusterValidated(ctx context.Context, clusterName string, user *UserInfo) (*rest.Config, error)
- func (m *Manager) ListClusters(ctx context.Context, user *UserInfo) ([]ClusterSummary, error)
- func (m *Manager) ResolveCluster(ctx context.Context, namePattern string, user *UserInfo) (*ClusterSummary, error)
- func (m *Manager) Stats() ManagerStats
type ManagerOption
- func WithConnectivityConfig(config ConnectivityConfig) ManagerOption
- func WithConnectivityValidation(enabled bool) ManagerOption
- func WithManagerCacheConfig(config CacheConfig) ManagerOption
- func WithManagerCacheMetrics(metrics CacheMetricsRecorder) ManagerOption
- func WithManagerConnectionValidationTimeout(timeout time.Duration) ManagerOption
- func WithManagerLogger(logger *slog.Logger) ManagerOption
type ManagerStats
type OAuthAuthMetricsRecorder
type OAuthClientProvider
- func NewOAuthClientProvider(config *OAuthClientProviderConfig) (*OAuthClientProvider, error)
- func NewOAuthClientProviderFromInCluster() (*OAuthClientProvider, error)
- func (p *OAuthClientProvider) GetClientsForUser(ctx context.Context, user *UserInfo) (kubernetes.Interface, dynamic.Interface, *rest.Config, error)
- func (p *OAuthClientProvider) SetMetrics(metrics OAuthAuthMetricsRecorder)
- func (p *OAuthClientProvider) SetTokenExtractor(extractor TokenExtractor)
type OAuthClientProviderConfig
- func DefaultOAuthClientProviderConfig() *OAuthClientProviderConfig
type StaticClientProvider
- func (p *StaticClientProvider) GetClientsForUser(_ context.Context, _ *UserInfo) (kubernetes.Interface, dynamic.Interface, *rest.Config, error)
type TLSError
- func (e *TLSError) Error() string
- func (e *TLSError) Is(target error) bool
- func (e *TLSError) Unwrap() error
- func (e *TLSError) UserFacingError() string
type TokenExtractor
type UserInfo
type ValidationError
- func (e *ValidationError) Error() string
- func (e *ValidationError) Unwrap() error
- func (e *ValidationError) UserFacingError() string

Constants ¶

View Source

const (
	// LabelClusterName is the standard CAPI cluster name label.
	LabelClusterName = "cluster.x-k8s.io/cluster-name"

	// LabelGiantSwarmCluster is Giant Swarm's cluster label.
	LabelGiantSwarmCluster = "giantswarm.io/cluster"

	// LabelGiantSwarmOrganization is the organization/tenant label.
	LabelGiantSwarmOrganization = "giantswarm.io/organization"

	// LabelGiantSwarmRelease is the Giant Swarm release version label.
	LabelGiantSwarmRelease = "release.giantswarm.io/version"

	// AnnotationClusterDescription is the cluster description annotation.
	AnnotationClusterDescription = "cluster.giantswarm.io/description"
)

Giant Swarm specific label keys for CAPI clusters.

View Source

const (
	// ProviderAWS indicates an AWS CAPI cluster.
	ProviderAWS = "aws"

	// ProviderAzure indicates an Azure CAPI cluster.
	ProviderAzure = "azure"

	// ProviderVSphere indicates a vSphere CAPI cluster.
	ProviderVSphere = "vsphere"

	// ProviderGCP indicates a GCP CAPI cluster.
	ProviderGCP = "gcp"

	// ProviderUnknown is used when the provider cannot be determined.
	ProviderUnknown = "unknown"
)

Common infrastructure provider references.

View Source

const (
	// ImpersonateUserHeader is the header name for the impersonated user.
	ImpersonateUserHeader = "Impersonate-User"

	// ImpersonateGroupHeader is the header name for impersonated groups.
	ImpersonateGroupHeader = "Impersonate-Group"

	// ImpersonateExtraHeaderPrefix is the prefix for extra impersonation headers.
	ImpersonateExtraHeaderPrefix = "Impersonate-Extra-"
)

ImpersonationHeaders contains the header names used for Kubernetes user impersonation.

View Source

const (
	// ImpersonationAgentName is the identifier used in Impersonate-Extra-agent headers.
	// This allows audit logs to identify that operations were performed via mcp-kubernetes.
	ImpersonationAgentName = "mcp-kubernetes"

	// ImpersonationAgentExtraKey is the key used for the agent identifier in extra headers.
	// This appears as "Impersonate-Extra-agent: mcp-kubernetes" in HTTP requests.
	ImpersonationAgentExtraKey = "agent"
)

Impersonation agent identification for audit trails.

View Source

const (
	// MaxEmailLength is the maximum allowed length for an email address.
	MaxEmailLength = 254

	// MaxGroupNameLength is the maximum allowed length for a group name.
	MaxGroupNameLength = 256

	// MaxGroupCount is the maximum number of groups allowed per user.
	MaxGroupCount = 100

	// MaxExtraKeyLength is the maximum allowed length for an extra header key.
	MaxExtraKeyLength = 256

	// MaxExtraValueLength is the maximum allowed length for an extra header value.
	MaxExtraValueLength = 1024

	// MaxExtraCount is the maximum number of extra headers allowed.
	MaxExtraCount = 50

	// MaxClusterNameLength is the maximum allowed length for a cluster name.
	// Kubernetes names are limited to 253 characters.
	MaxClusterNameLength = 253
)

Validation constants for security limits.

View Source

const CAPISecretKey = "value"

CAPISecretKey is the key within the kubeconfig secret that contains the actual kubeconfig YAML data (standard CAPI convention).

View Source

const CAPISecretKeyAlternate = "kubeconfig"

CAPISecretKeyAlternate is an alternate key used by some CAPI providers for storing kubeconfig data in secrets.

View Source

const CAPISecretSuffix = "-kubeconfig"

CAPISecretSuffix is the suffix used by CAPI for kubeconfig secrets. The full secret name is: ${CLUSTER_NAME}-kubeconfig nolint:gosec // G101: This is not a hardcoded credential, it's a suffix for secret naming convention

View Source

const DefaultConnectionValidationTimeout = 10 * time.Second

DefaultConnectionValidationTimeout is the default timeout for validating cluster connectivity. This can be overridden using WithManagerConnectionValidationTimeout.

View Source

const ImpersonationTraceIDKey = "trace-id"

ImpersonationTraceIDKey is the key used for trace ID in impersonation extra headers. This appears as "Impersonate-Extra-trace-id: <trace_id>" in HTTP requests.

View Source

const UserExtraOAuthTokenKey = "oauth_token"

UserExtraOAuthTokenKey is the key used in UserInfo.Extra to store OAuth tokens as a fallback mechanism for testing or alternative authentication flows. nolint:gosec // G101: This is a key name, not a credential

Variables ¶

View Source

var (
	// ErrClusterNotFound indicates that the requested cluster does not exist
	// or the user does not have permission to access it.
	ErrClusterNotFound = errors.New("cluster not found")

	// ErrKubeconfigSecretNotFound indicates that the CAPI kubeconfig secret
	// for the cluster is missing from the Management Cluster.
	ErrKubeconfigSecretNotFound = errors.New("kubeconfig secret not found")

	// ErrKubeconfigInvalid indicates that the kubeconfig secret contains
	// malformed or unparseable kubeconfig data.
	ErrKubeconfigInvalid = errors.New("kubeconfig data is invalid")

	// ErrConnectionFailed indicates a network or TLS error when attempting
	// to connect to the target cluster.
	ErrConnectionFailed = errors.New("failed to connect to cluster")

	// ErrImpersonationFailed indicates that the user impersonation could not
	// be configured on the cluster client.
	ErrImpersonationFailed = errors.New("failed to configure user impersonation")

	// ErrManagerClosed indicates that the ClusterClientManager has been closed
	// and can no longer be used.
	ErrManagerClosed = errors.New("federation manager is closed")

	// ErrAccessDenied indicates that the user does not have permission to
	// perform the requested operation. This is returned after a SubjectAccessReview
	// determines the user lacks the required RBAC permissions.
	ErrAccessDenied = errors.New("access denied")

	// ErrAccessCheckFailed indicates that the access check itself failed
	// (e.g., due to API server errors), not that access was denied.
	ErrAccessCheckFailed = errors.New("access check failed")

	// ErrInvalidAccessCheck indicates that the AccessCheck parameters are invalid.
	ErrInvalidAccessCheck = errors.New("invalid access check parameters")

	// ErrClusterUnreachable indicates that the cluster API server is not reachable.
	// This typically occurs due to network issues such as:
	//   - VPC peering not configured
	//   - Security group rules blocking access
	//   - DNS resolution failures
	ErrClusterUnreachable = errors.New("cluster unreachable")

	// ErrTLSHandshakeFailed indicates that the TLS handshake with the cluster failed.
	// Common causes include:
	//   - Certificate signed by unknown authority
	//   - Expired certificate
	//   - Certificate hostname mismatch
	ErrTLSHandshakeFailed = errors.New("TLS handshake failed")

	// ErrConnectionTimeout indicates that the connection to the cluster timed out.
	// This can happen when:
	//   - The cluster is behind a firewall
	//   - Network latency is too high
	//   - The cluster is not running
	ErrConnectionTimeout = errors.New("connection timeout")
)

Sentinel errors for common federation failure scenarios. These errors can be checked using errors.Is() for programmatic error handling.

View Source

var (
	// ErrUserInfoRequired indicates that user information is required but was not provided.
	ErrUserInfoRequired = fmt.Errorf("user information is required for cluster operations")

	// ErrUserEmailRequired indicates that the user's email is required but not present.
	// The email is used as the Impersonate-User header value for Kubernetes RBAC.
	ErrUserEmailRequired = fmt.Errorf("user email is required for impersonation")

	// ErrInvalidEmail indicates that the email address format is invalid.
	ErrInvalidEmail = fmt.Errorf("invalid email address format")

	// ErrInvalidGroupName indicates that a group name is invalid.
	ErrInvalidGroupName = fmt.Errorf("invalid group name")

	// ErrInvalidExtraHeader indicates that an extra header key or value is invalid.
	ErrInvalidExtraHeader = fmt.Errorf("invalid extra header")

	// ErrInvalidClusterName indicates that a cluster name is invalid.
	ErrInvalidClusterName = fmt.Errorf("invalid cluster name")
)

Validation errors.

View Source

var (
	// CAPIClusterGVR is the GroupVersionResource for CAPI Cluster objects.
	CAPIClusterGVR = schema.GroupVersionResource{
		Group:    "cluster.x-k8s.io",
		Version:  "v1beta1",
		Resource: "clusters",
	}
)

CAPI resource identifiers and conventions for cluster lookup.

View Source

var ErrCAPICRDNotInstalled = fmt.Errorf("CAPI CRDs not installed")

ErrCAPICRDNotInstalled indicates that CAPI CRDs are not installed on the cluster.

View Source

var ErrInvalidHealthCheckPath = errors.New("invalid health check path")

ErrInvalidHealthCheckPath indicates that the health check path is invalid.

View Source

var ValidKubernetesVerbs = map[string]bool{
	"get":              true,
	"list":             true,
	"watch":            true,
	"create":           true,
	"update":           true,
	"patch":            true,
	"delete":           true,
	"deletecollection": true,
	"impersonate":      true,
	"bind":             true,
	"escalate":         true,
	"*":                true,
}

ValidKubernetesVerbs is the set of valid Kubernetes API verbs that can be used in access checks. This is exported for use in validation and UI components.

Functions ¶

func AnonymizeEmail ¶

func AnonymizeEmail(email string) string

AnonymizeEmail returns a hashed representation of an email for logging purposes. This allows correlation of log entries without exposing PII.

func AnonymizeUserInfo ¶

func AnonymizeUserInfo(user *UserInfo) map[string]interface{}

AnonymizeUserInfo returns anonymized user identifiers for logging. Returns a map with "user_hash" and "group_count" for safe logging.

func ApplyConnectivityConfig ¶ added in v0.0.60

func ApplyConnectivityConfig(config *rest.Config, cc ConnectivityConfig)

ApplyConnectivityConfig applies the connectivity configuration to a rest.Config. This modifies the config in place to use the specified timeouts and rate limits.

Note: This function modifies the provided config. If you need to preserve the original config, use rest.CopyConfig() first.

func CheckConnectivity ¶ added in v0.0.60

func CheckConnectivity(ctx context.Context, clusterName string, config *rest.Config, cc ConnectivityConfig) error

CheckConnectivity verifies that the MCP server can reach the Workload Cluster API server. This should be called before caching a client to detect network issues early.

The check performs a GET request to the health endpoint (default: /healthz) which doesn't require authentication, making it suitable for connectivity validation.

Network Topology ¶

This check validates the complete network path from the MCP server to the target cluster, including:

DNS resolution
TCP connectivity (VPC peering, Transit Gateway, konnectivity)
TLS handshake (certificate validation)
HTTP/2 connection establishment

Error Types ¶

Returns different error types based on the failure:

ErrConnectionTimeout: TCP connection timed out
ErrTLSHandshakeFailed: TLS/certificate issues
ErrClusterUnreachable: General connectivity failure

func CheckConnectivityWithRetry ¶ added in v0.0.60

func CheckConnectivityWithRetry(ctx context.Context, clusterName string, config *rest.Config, cc ConnectivityConfig) error

CheckConnectivityWithRetry performs connectivity check with retry logic. This is useful for handling transient network issues during cluster discovery.

The function uses exponential backoff between retries:

attempt 1: immediate
attempt 2: wait RetryBackoff
attempt 3: wait RetryBackoff * 2
attempt 4: wait RetryBackoff * 4
...

Returns the last error if all retry attempts fail.

func ConfigWithImpersonation ¶ added in v0.0.54

func ConfigWithImpersonation(config *rest.Config, user *UserInfo) *rest.Config

ConfigWithImpersonation returns a copy of the config with impersonation configured. This is used to create per-user clients from the base kubeconfig credentials.

Audit Trail ¶

This function automatically adds the "agent: mcp-kubernetes" extra header to all impersonated requests. This allows Kubernetes audit logs to identify that operations were performed via the MCP server, providing a clear audit trail.

Security: The agent header is immutable and added AFTER user extras. Any attempt by a user to override it via OAuth extra claims will be ignored. This ensures audit trail integrity even if other user claims are manipulated.

The resulting HTTP headers will include:

Impersonate-User: <user.Email>
Impersonate-Group: <user.Groups[0]>
Impersonate-Group: <user.Groups[1]>
...
Impersonate-Extra-agent: mcp-kubernetes
Impersonate-Extra-<key>: <value>  (for each entry in user.Extra)

Security ¶

This function panics if config is non-nil but user is nil. This is a deliberate security measure: silently returning a non-impersonated config when impersonation was expected could lead to privilege escalation. The panic indicates a programming error that must be fixed.

Nil handling:

If config is nil, returns nil (nothing to configure)
If user is nil with non-nil config, panics (programming error - use ValidateUserInfo first)

func ConfigWithImpersonationAndTraceID ¶ added in v0.0.67

func ConfigWithImpersonationAndTraceID(config *rest.Config, user *UserInfo, traceID string) *rest.Config

ConfigWithImpersonationAndTraceID returns a copy of the config with impersonation configured, including trace ID for distributed tracing correlation.

This function extends ConfigWithImpersonation by adding the trace ID to the impersonation extra headers. This allows Kubernetes audit logs to be correlated with OpenTelemetry traces, bridging the "audit gap" when the MCP server acts as a proxy.

Audit Trail Enhancement ¶

The resulting HTTP headers will include:

Impersonate-User: <user.Email>
Impersonate-Group: <user.Groups[...]>
Impersonate-Extra-agent: mcp-kubernetes
Impersonate-Extra-trace-id: <traceID>
Impersonate-Extra-<key>: <value>  (for each entry in user.Extra)

Kubernetes Audit Log on WC will show:

{
  "user": {
    "username": "jane@giantswarm.io",
    "extra": {
      "agent": ["mcp-kubernetes"],
      "trace-id": ["abc123def456"]
    }
  }
}

Security ¶

Both agent and trace-id are added AFTER user extras to ensure they cannot be overridden by manipulated OAuth claims.

Parameters ¶

config: The base REST config (typically from cluster kubeconfig)
user: User identity information for impersonation
traceID: OpenTelemetry trace ID (empty string if tracing is disabled)

func GetEndpointType ¶ added in v0.0.60

func GetEndpointType(host string) string

GetEndpointType attempts to classify the endpoint type based on the host URL. This is informational and used for logging/debugging purposes.

Returns one of:

"private" - Private IP address (VPC peering required)
"public" - Public DNS/IP address
"konnectivity" - Konnectivity proxy endpoint
"unknown" - Cannot determine endpoint type

func UserHashAttr ¶ added in v0.0.54

func UserHashAttr(email string) slog.Attr

UserHashAttr returns a slog attribute with the anonymized user email. This is a convenience function to reduce repetition in logging calls and ensure consistent attribute naming across the codebase.

Usage:

m.logger.Debug("Operation completed", UserHashAttr(user.Email))

func ValidateAccessCheck ¶ added in v0.0.57

func ValidateAccessCheck(check *AccessCheck) error

ValidateAccessCheck validates the AccessCheck parameters. Returns ErrInvalidAccessCheck if the check is invalid.

func ValidateClusterName ¶

func ValidateClusterName(name string) error

ValidateClusterName validates a cluster name against Kubernetes naming conventions.

func ValidateUserInfo ¶

func ValidateUserInfo(user *UserInfo) error

ValidateUserInfo validates the UserInfo struct for security. Returns ErrUserInfoRequired if user is nil. Returns a ValidationError if any field fails validation.

Types ¶

type AccessCheck ¶ added in v0.0.57

type AccessCheck struct {
	// Verb is the Kubernetes API verb to check (e.g., "get", "list", "create", "delete", "patch", "watch").
	// This is required.
	Verb string

	// Resource is the Kubernetes resource type to check (e.g., "pods", "deployments", "secrets").
	// This is required.
	Resource string

	// APIGroup is the API group for the resource (e.g., "", "apps", "batch").
	// Use "" for core API resources like pods and services.
	APIGroup string

	// Namespace is the namespace to check permissions in.
	// Leave empty for cluster-scoped resources or cluster-wide checks.
	Namespace string

	// Name is the specific resource name to check.
	// Leave empty to check permissions for all resources of the type.
	Name string

	// Subresource is the subresource to check (e.g., "logs", "exec", "portforward" for pods).
	// Leave empty for the main resource.
	Subresource string
}

AccessCheck describes a permission check to perform against a Kubernetes cluster. This is used with SubjectAccessReview to verify if the authenticated user can perform a specific action before attempting the operation.

Usage ¶

AccessCheck is typically used for pre-flight checks before destructive operations:

check := &AccessCheck{
	Verb:      "delete",
	Resource:  "pods",
	APIGroup:  "", // core API group
	Namespace: "production",
}
result, err := manager.CheckAccess(ctx, "my-cluster", user, check)
if err != nil {
	return err
}
if !result.Allowed {
	return fmt.Errorf("permission denied: %s", result.Reason)
}

type AccessCheckError ¶ added in v0.0.57

type AccessCheckError struct {
	// ClusterName is the cluster where the check was attempted.
	ClusterName string

	// Check contains the access check parameters.
	Check *AccessCheck

	// Reason describes what went wrong during the check.
	Reason string

	// Err is the underlying error.
	Err error
}

AccessCheckError provides context when the access check itself fails. This is different from AccessDeniedError: it means we couldn't determine whether access is allowed, not that access is denied.

func (*AccessCheckError) Error ¶ added in v0.0.57

func (e *AccessCheckError) Error() string

Error implements the error interface.

func (*AccessCheckError) Is ¶ added in v0.0.57

func (e *AccessCheckError) Is(target error) bool

Is implements custom error matching for errors.Is().

func (*AccessCheckError) Unwrap ¶ added in v0.0.57

func (e *AccessCheckError) Unwrap() error

Unwrap returns the underlying error.

func (*AccessCheckError) UserFacingError ¶ added in v0.0.57

func (e *AccessCheckError) UserFacingError() string

UserFacingError returns a message suitable for displaying to end users.

type AccessCheckResult ¶ added in v0.0.57

type AccessCheckResult struct {
	// Allowed indicates whether the requested action is permitted.
	Allowed bool

	// Denied indicates whether the requested action was explicitly denied.
	// This is different from !Allowed - a request can be neither allowed nor denied
	// (e.g., when no policy matches).
	Denied bool

	// Reason provides a human-readable explanation of the decision.
	// This may include information about which RBAC rule matched or why access was denied.
	Reason string

	// EvaluationError contains any error that occurred during policy evaluation.
	// A non-empty EvaluationError typically means the result is inconclusive.
	EvaluationError string
}

AccessCheckResult contains the result of a SubjectAccessReview check.

type AccessDeniedError ¶ added in v0.0.57

type AccessDeniedError struct {
	// ClusterName is the cluster where the permission check was performed.
	ClusterName string

	// UserEmail is the email of the user (for logging only, anonymized in Error()).
	UserEmail string

	// Verb is the action that was denied (e.g., "delete", "create").
	Verb string

	// Resource is the resource type for which access was denied.
	Resource string

	// APIGroup is the API group of the resource.
	APIGroup string

	// Namespace is the namespace where access was denied (empty for cluster-scoped).
	Namespace string

	// Name is the specific resource name if checked (empty for type-level checks).
	Name string

	// Reason provides details about why access was denied (from Kubernetes).
	Reason string
}

AccessDeniedError provides detailed context about a permission denial. This error is returned when a SubjectAccessReview determines the user lacks permission to perform an operation.

Usage ¶

AccessDeniedError provides actionable information about what permission is missing:

if errors.Is(err, federation.ErrAccessDenied) {
	var accessErr *federation.AccessDeniedError
	if errors.As(err, &accessErr) {
		fmt.Printf("You need %s permission on %s/%s in namespace %s\n",
			accessErr.Verb, accessErr.APIGroup, accessErr.Resource, accessErr.Namespace)
	}
}

func (*AccessDeniedError) Error ¶ added in v0.0.57

func (e *AccessDeniedError) Error() string

Error implements the error interface.

func (*AccessDeniedError) Is ¶ added in v0.0.57

func (e *AccessDeniedError) Is(target error) bool

Is implements custom error matching for errors.Is(). This allows AccessDeniedError to match against ErrAccessDenied.

func (*AccessDeniedError) Unwrap ¶ added in v0.0.57

func (e *AccessDeniedError) Unwrap() error

Unwrap returns nil as there is no underlying error.

func (*AccessDeniedError) UserFacingError ¶ added in v0.0.57

func (e *AccessDeniedError) UserFacingError() string

UserFacingError returns a message suitable for displaying to end users. This provides enough context for the user to understand what permission they need without exposing internal system details.

type AmbiguousClusterError ¶ added in v0.0.59

type AmbiguousClusterError struct {
	Pattern string
	// Matches contains the clusters that matched the pattern, used to provide
	// helpful feedback to users about which clusters they might have meant.
	Matches []ClusterSummary
}

AmbiguousClusterError is returned when a cluster name pattern matches multiple clusters.

func (*AmbiguousClusterError) Error ¶ added in v0.0.59

func (e *AmbiguousClusterError) Error() string

Error implements the error interface.

func (*AmbiguousClusterError) UserFacingError ¶ added in v0.0.59

func (e *AmbiguousClusterError) UserFacingError() string

UserFacingError returns a user-friendly error message.

type CacheConfig ¶ added in v0.0.53

type CacheConfig struct {
	// TTL is the time-to-live for cached clients. After this duration,
	// entries are eligible for eviction.
	//
	// Security note: Set this to be less than or equal to your OAuth token
	// lifetime to ensure cached clients don't outlive user authorization.
	//
	// Default: 10 minutes.
	TTL time.Duration

	// MaxEntries is the maximum number of entries the cache can hold.
	// When exceeded, least recently accessed entries are evicted.
	//
	// Each unique (clusterName, userEmail) pair creates one cache entry.
	// Monitor the mcp_client_cache_entries metric to tune this value.
	//
	// Default: 1000.
	MaxEntries int

	// CleanupInterval is how often the background cleanup runs to remove
	// expired entries.
	//
	// Default: 1 minute.
	CleanupInterval time.Duration
}

CacheConfig holds configuration options for the ClientCache.

Security Considerations ¶

The TTL setting has security implications: cached clients may persist after a user's OAuth token is invalidated or revoked. To mitigate this:

Set TTL to be less than or equal to your OAuth token lifetime
Use DeleteByCluster() when cluster credentials are rotated
Use Delete() when a user's access should be immediately revoked

Capacity Planning ¶

Cache entries are keyed by (clusterName, userEmail) pairs. With the default MaxEntries of 1000, this could represent:

1000 users accessing 1 cluster each, or
100 users accessing 10 clusters each, or
10 users accessing 100 clusters each

Monitor the mcp_client_cache_entries metric and adjust MaxEntries based on your actual usage patterns. LRU eviction ensures the most active users/clusters are retained when capacity is exceeded.

func DefaultCacheConfig ¶ added in v0.0.53

func DefaultCacheConfig() CacheConfig

DefaultCacheConfig returns a CacheConfig with sensible defaults.

type CacheMetricsRecorder ¶ added in v0.0.53

type CacheMetricsRecorder interface {
	// RecordCacheHit records a cache hit event.
	RecordCacheHit(ctx context.Context, clusterName string)

	// RecordCacheMiss records a cache miss event.
	RecordCacheMiss(ctx context.Context, clusterName string)

	// RecordCacheEviction records a cache eviction event.
	RecordCacheEviction(ctx context.Context, reason string)

	// SetCacheSize sets the current cache size gauge.
	SetCacheSize(ctx context.Context, size int)
}

CacheMetricsRecorder defines the interface for recording cache metrics. This allows decoupling from the concrete instrumentation implementation.

type CacheStats ¶ added in v0.0.53

type CacheStats struct {
	// Size is the current number of entries in the cache.
	Size int

	// MaxEntries is the maximum capacity.
	MaxEntries int

	// TTL is the configured time-to-live.
	TTL time.Duration

	// OldestEntry is the age of the oldest entry (if any).
	OldestEntry time.Duration

	// NewestEntry is the age of the newest entry (if any).
	NewestEntry time.Duration
}

Stats returns current cache statistics.

type ClientCache ¶ added in v0.0.53

type ClientCache struct {
	// contains filtered or unexported fields
}

ClientCache provides thread-safe caching of Kubernetes clients with TTL-based eviction and memory management.

The cache is keyed by a composite of cluster name and user email to ensure that clients configured for different users are never shared.

func NewClientCache ¶ added in v0.0.53

func NewClientCache(opts ...ClientCacheOption) *ClientCache

NewClientCache creates a new ClientCache with the provided options. The cache automatically starts a background goroutine for cleanup.

func (*ClientCache) Close ¶ added in v0.0.53

func (c *ClientCache) Close() error

Close stops the background cleanup goroutine and clears the cache. After Close is called, all cache operations become no-ops.

func (*ClientCache) Delete ¶ added in v0.0.53

func (c *ClientCache) Delete(ctx context.Context, clusterName, userEmail string)

Delete removes a cached client for the given cluster and user. This is useful for invalidating cache entries when credentials change.

func (*ClientCache) DeleteByCluster ¶ added in v0.0.53

func (c *ClientCache) DeleteByCluster(ctx context.Context, clusterName string)

DeleteByCluster removes all cached clients for the given cluster. This is useful when cluster credentials are rotated.

func (*ClientCache) Get ¶ added in v0.0.53

func (c *ClientCache) Get(ctx context.Context, clusterName, userEmail string) *cachedClient

Get retrieves a cached client for the given cluster and user. Returns nil if no valid cached client exists. This method is thread-safe and records cache hit/miss metrics.

func (*ClientCache) GetOrCreate ¶ added in v0.0.53

func (c *ClientCache) GetOrCreate(
	ctx context.Context,
	clusterName, userEmail string,
	factory func(ctx context.Context) (kubernetes.Interface, dynamic.Interface, *rest.Config, error),
) (kubernetes.Interface, dynamic.Interface, error)

GetOrCreate retrieves a cached client or creates a new one using the provided factory. This method uses singleflight to prevent thundering herd when multiple goroutines request the same client simultaneously.

The factory function is called only on cache miss and is guaranteed to be called at most once per unique key, even under high concurrency.

func (*ClientCache) Set ¶ added in v0.0.53

func (c *ClientCache) Set(ctx context.Context, clusterName, userEmail string, clientset kubernetes.Interface, dynamicClient dynamic.Interface, restConfig *rest.Config)

Set stores a client in the cache for the given cluster and user. This method is thread-safe.

func (*ClientCache) Size ¶ added in v0.0.53

func (c *ClientCache) Size() int

Size returns the current number of entries in the cache.

func (*ClientCache) Stats ¶ added in v0.0.53

func (c *ClientCache) Stats() CacheStats

Stats returns current cache statistics for monitoring.

type ClientCacheOption ¶ added in v0.0.53

type ClientCacheOption func(*ClientCache)

ClientCacheOption is a functional option for configuring ClientCache.

func WithCacheConfig ¶ added in v0.0.53

func WithCacheConfig(config CacheConfig) ClientCacheOption

WithCacheConfig sets the cache configuration.

func WithCacheLogger ¶ added in v0.0.53

func WithCacheLogger(logger *slog.Logger) ClientCacheOption

WithCacheLogger sets the logger for the cache.

func WithCacheMetrics ¶ added in v0.0.53

func WithCacheMetrics(metrics CacheMetricsRecorder) ClientCacheOption

WithCacheMetrics sets the metrics recorder for the cache.

type ClientProvider ¶ added in v0.0.54

type ClientProvider interface {
	// GetClientsForUser returns Kubernetes clients authenticated as the specified user.
	// The returned clients use the user's OAuth token for authentication, ensuring
	// all operations are performed with the user's RBAC permissions.
	//
	// Parameters:
	//   - ctx: Context for the request (may contain OAuth token)
	//   - user: User identity information from OAuth claims
	//
	// Returns:
	//   - kubernetes.Interface: Clientset for typed API access
	//   - dynamic.Interface: Dynamic client for CRD access (e.g., CAPI resources)
	//   - *rest.Config: REST config for creating additional clients
	//   - error: Any error during client creation
	GetClientsForUser(ctx context.Context, user *UserInfo) (kubernetes.Interface, dynamic.Interface, *rest.Config, error)
}

ClientProvider creates Kubernetes clients scoped to a specific user's identity. This interface enables per-request client creation when OAuth downstream is enabled, ensuring that each user's RBAC permissions are enforced on the Management Cluster.

Security Model ¶

When OAuth downstream is enabled:

Each request carries the user's OAuth access token
GetClientsForUser creates clients authenticated as that user
All Management Cluster operations (including kubeconfig secret retrieval) are performed with the user's identity, enforcing their RBAC permissions

This provides defense in depth: users must have RBAC permission to read kubeconfig secrets on the Management Cluster, AND their impersonated identity must have permissions on the Workload Cluster.

type ClusterClientManager ¶

type ClusterClientManager interface {
	// GetClient returns a Kubernetes client for the target cluster,
	// configured to impersonate the provided user.
	// If clusterName is empty, returns the local (Management Cluster) client.
	//
	// The returned client has Impersonate-User and Impersonate-Group headers
	// configured based on the UserInfo, ensuring all operations are executed
	// under the authenticated user's identity.
	GetClient(ctx context.Context, clusterName string, user *UserInfo) (kubernetes.Interface, error)

	// GetDynamicClient returns a dynamic client for the target cluster,
	// useful for working with CRDs like CAPI resources.
	// If clusterName is empty, returns the local (Management Cluster) dynamic client.
	//
	// Like GetClient, the returned client is configured for user impersonation.
	GetDynamicClient(ctx context.Context, clusterName string, user *UserInfo) (dynamic.Interface, error)

	// ListClusters returns a list of available workload clusters.
	// The list is filtered based on the user's RBAC permissions - only clusters
	// the user has access to view will be returned.
	//
	// This method queries CAPI Cluster resources on the Management Cluster.
	ListClusters(ctx context.Context, user *UserInfo) ([]ClusterSummary, error)

	// GetClusterSummary returns detailed information about a specific cluster.
	// Returns ErrClusterNotFound if the cluster doesn't exist or the user
	// doesn't have permission to access it.
	GetClusterSummary(ctx context.Context, clusterName string, user *UserInfo) (*ClusterSummary, error)

	// CheckAccess verifies if the user can perform the specified action on a cluster.
	// This performs a SelfSubjectAccessReview to check permissions without actually
	// attempting the operation.
	//
	// Pre-flight checks improve user experience by failing fast with clear error
	// messages and reduce noise in Kubernetes audit logs from failed requests.
	//
	// Parameters:
	//   - ctx: Context for the request
	//   - clusterName: Target cluster (empty for local/management cluster)
	//   - user: Authenticated user info for impersonation
	//   - check: Describes the action to check (verb, resource, namespace, etc.)
	//
	// Returns:
	//   - *AccessCheckResult: Contains Allowed/Denied status and reason
	//   - error: Non-nil if the check itself failed (not the same as denied)
	//
	// Example:
	//
	//	result, err := manager.CheckAccess(ctx, "prod-cluster", user, &AccessCheck{
	//		Verb:      "delete",
	//		Resource:  "pods",
	//		Namespace: "production",
	//	})
	//	if err != nil {
	//		return err // Check failed
	//	}
	//	if !result.Allowed {
	//		return fmt.Errorf("permission denied: %s", result.Reason)
	//	}
	CheckAccess(ctx context.Context, clusterName string, user *UserInfo, check *AccessCheck) (*AccessCheckResult, error)

	// Close releases all cached clients and resources.
	// After Close is called, all other methods will return ErrManagerClosed.
	Close() error

	// Stats returns current cache and manager statistics for monitoring.
	// This is useful for health endpoints and operational dashboards.
	Stats() ManagerStats
}

ClusterClientManager manages Kubernetes clients for multi-cluster operations. It retrieves clients for both the local Management Cluster and remote Workload Clusters, with support for user impersonation.

All methods are thread-safe and can be called concurrently from multiple tool handlers.

type ClusterDiscoveryError ¶ added in v0.0.59

type ClusterDiscoveryError struct {
	Reason string
	Err    error
}

ClusterDiscoveryError provides context about cluster discovery failures.

func (*ClusterDiscoveryError) Error ¶ added in v0.0.59

func (e *ClusterDiscoveryError) Error() string

Error implements the error interface.

func (*ClusterDiscoveryError) Is ¶ added in v0.0.59

func (e *ClusterDiscoveryError) Is(target error) bool

Is implements custom error matching for errors.Is().

func (*ClusterDiscoveryError) Unwrap ¶ added in v0.0.59

func (e *ClusterDiscoveryError) Unwrap() error

Unwrap returns the underlying error.

func (*ClusterDiscoveryError) UserFacingError ¶ added in v0.0.59

func (e *ClusterDiscoveryError) UserFacingError() string

UserFacingError returns a sanitized error message safe for end users.

type ClusterInfo ¶ added in v0.0.54

type ClusterInfo struct {
	// Name is the cluster name.
	Name string

	// Namespace is the namespace where the cluster resource and its kubeconfig secret reside.
	Namespace string
}

ClusterInfo contains information about a CAPI cluster needed for kubeconfig retrieval.

type ClusterListOptions ¶ added in v0.0.59

type ClusterListOptions struct {
	// Namespace filters clusters to a specific namespace (organization).
	// If empty, all namespaces are searched.
	Namespace string

	// LabelSelector filters clusters by label selector expression.
	// Uses standard Kubernetes label selector syntax.
	LabelSelector string

	// Provider filters clusters by infrastructure provider.
	Provider string

	// Status filters clusters by phase.
	Status ClusterPhase

	// ReadyOnly filters to only include ready clusters.
	ReadyOnly bool
}

ClusterListOptions provides options for filtering cluster listings.

type ClusterNotFoundError ¶

type ClusterNotFoundError struct {
	ClusterName string
	Namespace   string
	Reason      string
}

ClusterNotFoundError provides detailed context about a cluster lookup failure.

func (*ClusterNotFoundError) Error ¶

func (e *ClusterNotFoundError) Error() string

Error implements the error interface.

func (*ClusterNotFoundError) Unwrap ¶

func (e *ClusterNotFoundError) Unwrap() error

Unwrap returns the underlying sentinel error for use with errors.Is().

func (*ClusterNotFoundError) UserFacingError ¶

func (e *ClusterNotFoundError) UserFacingError() string

UserFacingError returns a sanitized error message safe for end users. This prevents leaking internal cluster names and namespace structure.

Security: Returns a generic message that doesn't reveal whether the cluster exists, preventing cluster enumeration attacks.

type ClusterPhase ¶

type ClusterPhase string

ClusterPhase represents the lifecycle phase of a CAPI cluster.

const (
	// ClusterPhasePending indicates the cluster is awaiting provisioning.
	ClusterPhasePending ClusterPhase = "Pending"

	// ClusterPhaseProvisioning indicates the cluster is being created.
	ClusterPhaseProvisioning ClusterPhase = "Provisioning"

	// ClusterPhaseProvisioned indicates the cluster is fully operational.
	ClusterPhaseProvisioned ClusterPhase = "Provisioned"

	// ClusterPhaseDeleting indicates the cluster is being deleted.
	ClusterPhaseDeleting ClusterPhase = "Deleting"

	// ClusterPhaseFailed indicates the cluster encountered a fatal error.
	ClusterPhaseFailed ClusterPhase = "Failed"

	// ClusterPhaseUnknown indicates the cluster phase cannot be determined.
	ClusterPhaseUnknown ClusterPhase = "Unknown"
)

Standard CAPI cluster phases.

type ClusterSummary ¶

type ClusterSummary struct {
	// Name is the unique identifier of the cluster within its namespace.
	// This corresponds to the Cluster API Cluster resource name.
	Name string `json:"name"`

	// Namespace is the organization namespace on the Management Cluster
	// where the CAPI Cluster resource is located.
	Namespace string `json:"namespace"`

	// Provider indicates the infrastructure provider (e.g., "aws", "azure", "vsphere").
	// This is extracted from the CAPI infrastructure reference.
	Provider string `json:"provider,omitempty"`

	// Release is the Giant Swarm release version running on the cluster.
	// Format follows semver, e.g., "19.3.0".
	Release string `json:"release,omitempty"`

	// KubernetesVersion is the Kubernetes version running on the cluster.
	// Format follows semver, e.g., "1.28.5".
	KubernetesVersion string `json:"kubernetesVersion,omitempty"`

	// Status indicates the current lifecycle phase of the cluster.
	// Common values: "Provisioned", "Provisioning", "Deleting", "Failed".
	Status string `json:"status"`

	// Ready indicates whether the cluster is fully operational and
	// ready to accept workloads.
	Ready bool `json:"ready"`

	// ControlPlaneReady indicates whether the control plane components
	// are healthy and operational.
	ControlPlaneReady bool `json:"controlPlaneReady"`

	// InfrastructureReady indicates whether the underlying infrastructure
	// (VMs, networks, etc.) is provisioned and healthy.
	InfrastructureReady bool `json:"infrastructureReady"`

	// NodeCount is the current number of worker nodes in the cluster.
	// This may differ from the desired count during scaling operations.
	NodeCount int `json:"nodeCount,omitempty"`

	// CreatedAt is the timestamp when the cluster was initially created.
	CreatedAt time.Time `json:"createdAt"`

	// Labels contains the Kubernetes labels applied to the Cluster resource.
	// These often include organization, team, or environment tags.
	Labels map[string]string `json:"labels,omitempty"`

	// Annotations contains the Kubernetes annotations on the Cluster resource.
	// May include operational metadata or external references.
	Annotations map[string]string `json:"annotations,omitempty"`
}

ClusterSummary provides basic information about a workload cluster. This is returned by ListClusters and contains metadata useful for cluster selection and display purposes.

func (*ClusterSummary) ClusterAge ¶ added in v0.0.59

func (cs *ClusterSummary) ClusterAge() time.Duration

ClusterAge returns the age of a cluster as a duration.

func (*ClusterSummary) Description ¶ added in v0.0.59

func (cs *ClusterSummary) Description() string

Description returns the cluster description from annotations.

func (*ClusterSummary) IsGiantSwarmCluster ¶ added in v0.0.59

func (cs *ClusterSummary) IsGiantSwarmCluster() bool

IsGiantSwarmCluster returns true if this cluster has Giant Swarm labels.

func (*ClusterSummary) Organization ¶ added in v0.0.59

func (cs *ClusterSummary) Organization() string

Organization returns the Giant Swarm organization for this cluster.

type ConnectionError ¶

type ConnectionError struct {
	ClusterName string
	Host        string
	Reason      string
	Err         error
}

ConnectionError provides detailed context about cluster connection failures.

func (*ConnectionError) Error ¶

func (e *ConnectionError) Error() string

Error implements the error interface.

func (*ConnectionError) Is ¶ added in v0.0.54

func (e *ConnectionError) Is(target error) bool

Is implements custom error matching for errors.Is(). This allows ConnectionError to match against ErrConnectionFailed.

func (*ConnectionError) Unwrap ¶

func (e *ConnectionError) Unwrap() error

Unwrap returns the underlying error for use with errors.Is() and errors.As().

func (*ConnectionError) UserFacingError ¶

func (e *ConnectionError) UserFacingError() string

UserFacingError returns a sanitized error message safe for end users. This prevents leaking internal host URLs and network topology.

Security: Returns a generic message consistent with other cluster errors to prevent error response differentiation attacks.

type ConnectivityConfig ¶ added in v0.0.60

type ConnectivityConfig struct {
	// ConnectionTimeout is the maximum time to wait for the initial TCP connection
	// to the cluster API server. This applies to the TCP dial phase only.
	//
	// Default: 5 seconds.
	ConnectionTimeout time.Duration

	// RequestTimeout is the maximum time to wait for individual API requests
	// to complete. This includes TLS handshake, sending the request, and receiving
	// the response.
	//
	// Default: 30 seconds.
	RequestTimeout time.Duration

	// RetryAttempts is the number of times to retry a failed connection before
	// giving up. This helps with transient network issues.
	//
	// Default: 3.
	RetryAttempts int

	// RetryBackoff is the initial backoff duration between retry attempts.
	// Subsequent retries use exponential backoff (backoff * 2^attempt).
	//
	// Default: 1 second.
	RetryBackoff time.Duration

	// HealthCheckPath is the API path used for health checks.
	// This path is validated to prevent path injection attacks.
	// Default: "/healthz" (standard Kubernetes health endpoint).
	HealthCheckPath string

	// QPS is the queries per second limit for the Kubernetes client.
	// This controls client-side rate limiting to prevent overwhelming the
	// target cluster's API server.
	//
	// # Operational Considerations
	//
	// The default value of 50 QPS is tuned for AI agent workloads, which
	// typically make burst requests when exploring cluster resources.
	// Consider adjusting this value based on:
	//   - Number of concurrent users/agents
	//   - Target cluster API server capacity
	//   - Workload patterns (batch operations vs. interactive queries)
	//
	// For shared clusters with many users, consider lowering this value
	// to ensure fair resource allocation.
	//
	// Default: 50 (suitable for single-user AI agent workloads).
	QPS float32

	// Burst is the maximum burst size for throttled requests.
	// This allows short bursts of requests above the QPS limit, which is
	// useful for AI agents that often need to fetch multiple resources
	// in quick succession (e.g., listing pods then fetching their logs).
	//
	// # Operational Considerations
	//
	// The default value of 100 allows agents to handle burst scenarios
	// like initial cluster exploration or responding to user queries
	// that require multiple API calls.
	//
	// The burst value should typically be 2x the QPS value. Lower values
	// may cause request throttling during legitimate burst scenarios.
	// Higher values may impact cluster API server performance.
	//
	// Default: 100 (allows burst operations while protecting API servers).
	Burst int
}

ConnectivityConfig holds configuration options for cluster connectivity. These settings control how the federation manager establishes and validates connections to workload clusters.

Default Values ¶

If not specified, the following defaults are used:

ConnectionTimeout: 5 seconds
RequestTimeout: 30 seconds
RetryAttempts: 3
RetryBackoff: 1 second (exponential backoff with factor 2)

Network Topology Considerations ¶

Giant Swarm deployments often span multiple VPCs and use various connectivity methods (VPC peering, Transit Gateway, konnectivity). Tune these values based on your network topology:

For high-latency networks (cross-region): increase ConnectionTimeout
For konnectivity proxies: increase RequestTimeout
For unstable networks: increase RetryAttempts

func DefaultConnectivityConfig ¶ added in v0.0.60

func DefaultConnectivityConfig() ConnectivityConfig

DefaultConnectivityConfig returns a ConnectivityConfig with sensible defaults. These defaults are suitable for typical VPC-peered deployments with single-user AI agent workloads.

Rate Limiting Defaults ¶

The default rate limiting values (QPS: 50, Burst: 100) are chosen to:

Allow efficient AI agent operations without throttling
Protect target cluster API servers from excessive load
Support burst scenarios like initial cluster exploration

For multi-tenant deployments or shared clusters, consider using lower values to ensure fair resource allocation across users.

func HighLatencyConnectivityConfig ¶ added in v0.0.60

func HighLatencyConnectivityConfig() ConnectivityConfig

HighLatencyConnectivityConfig returns a ConnectivityConfig optimized for high-latency networks such as cross-region deployments or konnectivity proxies.

type ConnectivityTimeoutError ¶ added in v0.0.60

type ConnectivityTimeoutError struct {
	// ClusterName is the target cluster that timed out.
	ClusterName string

	// Host is the API server endpoint that couldn't be reached.
	Host string

	// Timeout is the duration waited before giving up (if known).
	Timeout time.Duration

	// Err is the underlying error that caused the timeout.
	Err error
}

ConnectivityTimeoutError provides detailed context about a connection timeout. This error indicates that the TCP connection or HTTP request timed out before completing. It's typically caused by network issues such as:

Firewall rules blocking the connection
No route to the target network
High network latency
Target cluster not running

Troubleshooting ¶

When encountering this error, verify:

VPC peering or Transit Gateway is properly configured
Security group rules allow traffic on port 6443
The target cluster is healthy and running
DNS resolution is working correctly

func (*ConnectivityTimeoutError) Error ¶ added in v0.0.60

func (e *ConnectivityTimeoutError) Error() string

Error implements the error interface.

func (*ConnectivityTimeoutError) Is ¶ added in v0.0.60

func (e *ConnectivityTimeoutError) Is(target error) bool

Is implements custom error matching for errors.Is().

func (*ConnectivityTimeoutError) Unwrap ¶ added in v0.0.60

func (e *ConnectivityTimeoutError) Unwrap() error

Unwrap returns the underlying error.

func (*ConnectivityTimeoutError) UserFacingError ¶ added in v0.0.60

func (e *ConnectivityTimeoutError) UserFacingError() string

UserFacingError returns a message suitable for displaying to end users. This provides actionable guidance without exposing internal network details.

type ImpersonationError ¶ added in v0.0.56

type ImpersonationError struct {
	// ClusterName is the target cluster where impersonation failed.
	ClusterName string

	// UserEmail is the email of the user being impersonated (for logging only).
	UserEmail string

	// GroupCount is the number of groups in the impersonation request.
	GroupCount int

	// Reason describes what went wrong.
	Reason string

	// Err is the underlying error that caused the failure.
	Err error
}

ImpersonationError provides detailed context about impersonation failures. This error is returned when the MCP server cannot impersonate a user on a target cluster, typically due to RBAC configuration issues.

Common Causes ¶

1. Missing impersonation RBAC permissions on the workload cluster:

The admin credentials used by the MCP server need permission to
impersonate users and groups on the target cluster.

2. Invalid user identity data:

The OAuth-derived user info contains data that cannot be used
for impersonation (e.g., malformed email, invalid group names).

3. Cluster API server rejecting impersonation:

The workload cluster's API server may have policies that prevent
impersonation of certain users or groups.

func (*ImpersonationError) Error ¶ added in v0.0.56

func (e *ImpersonationError) Error() string

Error implements the error interface.

func (*ImpersonationError) Is ¶ added in v0.0.56

func (e *ImpersonationError) Is(target error) bool

Is implements custom error matching for errors.Is(). This allows ImpersonationError to match against ErrImpersonationFailed.

func (*ImpersonationError) Unwrap ¶ added in v0.0.56

func (e *ImpersonationError) Unwrap() error

Unwrap returns the underlying error for use with errors.Is() and errors.As().

func (*ImpersonationError) UserFacingError ¶ added in v0.0.56

func (e *ImpersonationError) UserFacingError() string

UserFacingError returns a sanitized error message safe for end users. This provides actionable guidance without exposing internal details.

Unlike cluster-related errors that use a generic message to prevent enumeration, impersonation errors indicate a configuration issue that the user's administrator needs to address.

type KubeconfigError ¶

type KubeconfigError struct {
	ClusterName string
	SecretName  string
	Namespace   string
	Reason      string
	Err         error
	// NotFound indicates the kubeconfig secret was not found (vs other errors like invalid data).
	// When true, Is() matches ErrKubeconfigSecretNotFound; otherwise it matches ErrKubeconfigInvalid.
	NotFound bool
}

KubeconfigError provides detailed context about kubeconfig retrieval failures.

Error Matching Semantics ¶

This error type implements both Is() and Unwrap() with distinct behaviors:

Is() matches against sentinel errors (ErrKubeconfigSecretNotFound, ErrKubeconfigInvalid) based on the NotFound field. This allows callers to use errors.Is() to distinguish between "secret not found" and "secret found but invalid" scenarios.
Unwrap() returns the underlying cause (Err field), allowing errors.Is() to also match against the root cause (e.g., a Kubernetes API error).

Example usage:

if errors.Is(err, federation.ErrKubeconfigSecretNotFound) {
    // Handle missing secret
} else if errors.Is(err, federation.ErrKubeconfigInvalid) {
    // Handle malformed kubeconfig
}

func (*KubeconfigError) Error ¶

func (e *KubeconfigError) Error() string

Error implements the error interface.

func (*KubeconfigError) Is ¶ added in v0.0.54

func (e *KubeconfigError) Is(target error) bool

Is implements custom error matching for errors.Is(). This allows KubeconfigError to match against our sentinel errors:

ErrKubeconfigSecretNotFound: matches when NotFound is true
ErrKubeconfigInvalid: matches when NotFound is false (i.e., the secret exists but contains invalid data, missing keys, or unparseable content)

Note: The underlying error (Err field) is matched via Unwrap(), not Is().

func (*KubeconfigError) Unwrap ¶

func (e *KubeconfigError) Unwrap() error

Unwrap returns the underlying error for use with errors.Is() and errors.As().

func (*KubeconfigError) UserFacingError ¶

func (e *KubeconfigError) UserFacingError() string

UserFacingError returns a sanitized error message safe for end users. This prevents leaking internal secret names and namespace structure.

Security: Returns a generic message regardless of whether the secret was not found vs. invalid data. This prevents attackers from determining cluster existence based on error response differentiation.

type Manager ¶

type Manager struct {
	// contains filtered or unexported fields
}

Manager implements ClusterClientManager for CAPI-based multi-cluster federation.

func NewManager ¶

func NewManager(clientProvider ClientProvider, opts ...ManagerOption) (*Manager, error)

NewManager creates a new ClusterClientManager with the provided ClientProvider.

Security Model ¶

The ClientProvider is responsible for creating per-user Kubernetes clients. This ensures that ALL Management Cluster operations (including kubeconfig secret retrieval) are performed with the user's RBAC permissions.

When OAuth downstream is enabled:

Each user's OAuth token is used to authenticate with the Management Cluster
Users can only access kubeconfig secrets they have RBAC permission to read
This provides defense in depth: MC RBAC + WC RBAC both enforced

Parameters:

clientProvider: Creates per-user clients for Management Cluster access
opts: Functional options for configuration

Example with OAuth downstream:

provider := &OAuthClientProvider{factory: bearerTokenFactory}
manager, err := federation.NewManager(provider,
    federation.WithManagerLogger(logger),
)

func (*Manager) CheckAccess ¶ added in v0.0.57

func (m *Manager) CheckAccess(ctx context.Context, clusterName string, user *UserInfo, check *AccessCheck) (*AccessCheckResult, error)

CheckAccess verifies if the user can perform the specified action on a cluster. This performs a SelfSubjectAccessReview to check permissions without actually attempting the operation.

Security Model ¶

The check is performed as the impersonated user, not the admin credentials. This means the SelfSubjectAccessReview evaluates the actual permissions the user would have when performing the operation.

Error Handling ¶

- Returns (nil, error) if the check itself failed (e.g., API server error) - Returns (*AccessCheckResult, nil) if the check completed successfully - The result.Allowed field indicates if the operation would be permitted

Performance Considerations ¶

SAR checks add ~50-100ms latency. For repeated operations, consider: - Caching results for short periods (30s-60s) - Making checks optional via configuration - Batching checks when possible

func (*Manager) CheckAccessAllowed ¶ added in v0.0.57

func (m *Manager) CheckAccessAllowed(ctx context.Context, clusterName string, user *UserInfo, check *AccessCheck) error

CheckAccessAllowed is a convenience method that performs an access check and returns a clear error if access is denied.

This is useful for pre-flight checks before destructive operations:

if err := manager.CheckAccessAllowed(ctx, cluster, user, &AccessCheck{
	Verb:      "delete",
	Resource:  "pods",
	Namespace: "production",
}); err != nil {
	return err // Either check failed or access denied
}
// Proceed with delete...

func (*Manager) CheckClusterConnectivity ¶ added in v0.0.60

func (m *Manager) CheckClusterConnectivity(ctx context.Context, clusterName string, user *UserInfo) error

CheckClusterConnectivity validates connectivity to a workload cluster. This is a public method that allows callers to explicitly check connectivity without caching the client.

Use Cases ¶

This method is useful for:

Debugging network issues between MC and WC
Implementing health checks for cluster lists
Pre-validating clusters before batch operations

Example:

err := manager.CheckClusterConnectivity(ctx, "prod-cluster", user)
if err != nil {
    log.Printf("Cluster unreachable: %v", err)
}

func (*Manager) Close ¶

func (m *Manager) Close() error

Close releases all cached clients and resources.

func (*Manager) GetClient ¶

func (m *Manager) GetClient(ctx context.Context, clusterName string, user *UserInfo) (kubernetes.Interface, error)

GetClient returns a Kubernetes client for the target cluster. Returns ErrUserInfoRequired if user is nil (to prevent privilege escalation). Returns ErrInvalidClusterName if the cluster name fails validation.

func (*Manager) GetClusterSummary ¶

func (m *Manager) GetClusterSummary(ctx context.Context, clusterName string, user *UserInfo) (*ClusterSummary, error)

GetClusterSummary returns information about a specific cluster. Returns ErrUserInfoRequired if user is nil (to prevent privilege escalation). Returns ErrInvalidClusterName if the cluster name fails validation. Returns ErrClusterNotFound if the cluster doesn't exist or the user doesn't have permission to access it.

The method queries CAPI Cluster resources using a field selector for efficiency, and returns detailed metadata including provider, release, Kubernetes version, and status information.

func (*Manager) GetDynamicClient ¶

func (m *Manager) GetDynamicClient(ctx context.Context, clusterName string, user *UserInfo) (dynamic.Interface, error)

GetDynamicClient returns a dynamic client for the target cluster. Returns ErrUserInfoRequired if user is nil (to prevent privilege escalation). Returns ErrInvalidClusterName if the cluster name fails validation.

func (*Manager) GetKubeconfigForCluster ¶ added in v0.0.54

func (m *Manager) GetKubeconfigForCluster(ctx context.Context, clusterName string, user *UserInfo) (*rest.Config, error)

GetKubeconfigForCluster retrieves the kubeconfig secret for a CAPI cluster and returns a rest.Config suitable for creating clients.

Security Model ¶

This method uses the user's credentials for ALL Management Cluster operations:

Finds the Cluster resource using user's dynamic client (RBAC enforced)
Fetches the kubeconfig secret using user's client (RBAC enforced)
Parses the kubeconfig into a rest.Config

The user must have RBAC permission to:

List/Get Cluster resources (cluster.x-k8s.io/v1beta1)
Get Secrets in the cluster's namespace

This provides defense in depth: users can only access kubeconfig secrets they have permission to read on the Management Cluster.

Security notes:

Never logs kubeconfig contents (sensitive credential data)
All user-facing errors are sanitized to prevent information leakage

func (*Manager) GetKubeconfigForClusterValidated ¶ added in v0.0.54

func (m *Manager) GetKubeconfigForClusterValidated(ctx context.Context, clusterName string, user *UserInfo) (*rest.Config, error)

GetKubeconfigForClusterValidated retrieves the kubeconfig and validates that the resulting config can establish a connection to the cluster.

This is useful when you want to ensure the credentials are valid before caching or using them for operations.

func (*Manager) ListClusters ¶

func (m *Manager) ListClusters(ctx context.Context, user *UserInfo) ([]ClusterSummary, error)

ListClusters returns all available workload clusters. Returns ErrUserInfoRequired if user is nil (to prevent privilege escalation).

The results are filtered based on the user's RBAC permissions - only clusters in namespaces the user can access will be returned.

This method queries CAPI Cluster resources (cluster.x-k8s.io/v1beta1) on the Management Cluster and extracts metadata including:

Provider (AWS, Azure, vSphere, etc.)
Giant Swarm release version
Kubernetes version
Cluster status and readiness

Returns ClusterDiscoveryError if CAPI CRDs are not installed.

func (*Manager) ResolveCluster ¶ added in v0.0.59

func (m *Manager) ResolveCluster(ctx context.Context, namePattern string, user *UserInfo) (*ClusterSummary, error)

ResolveCluster finds a cluster by name pattern, handling ambiguity. If the pattern matches exactly one cluster, returns its details. If the pattern matches multiple clusters, returns an AmbiguousClusterError. If no clusters match, returns ErrClusterNotFound.

func (*Manager) Stats ¶ added in v0.0.67

func (m *Manager) Stats() ManagerStats

Stats returns current cache and manager statistics for monitoring. This is useful for health endpoints and operational dashboards.

type ManagerOption ¶ added in v0.0.53

type ManagerOption func(*Manager)

ManagerOption is a functional option for configuring Manager.

func WithConnectivityConfig ¶ added in v0.0.60

func WithConnectivityConfig(config ConnectivityConfig) ManagerOption

WithConnectivityConfig sets the connectivity configuration for the Manager. This controls how the manager establishes and validates connections to workload clusters, including timeouts and retry behavior.

Network Topology Considerations ¶

Configure this based on your network topology:

For VPC-peered clusters: use DefaultConnectivityConfig()
For cross-region or konnectivity: use HighLatencyConnectivityConfig()

Example:

manager, err := federation.NewManager(provider,
    federation.WithConnectivityConfig(federation.HighLatencyConnectivityConfig()),
)

func WithConnectivityValidation ¶ added in v0.0.60

func WithConnectivityValidation(enabled bool) ManagerOption

WithConnectivityValidation enables connectivity validation before caching clients. When enabled, the manager will verify that a workload cluster is reachable before caching the client. This catches network issues early but adds latency to the first request for each cluster.

Trade-offs ¶

Enabled: Catches network issues early, better error messages, slight latency increase
Disabled: Faster first request, but network errors surface during actual operations

Default: false (disabled)

func WithManagerCacheConfig ¶ added in v0.0.53

func WithManagerCacheConfig(config CacheConfig) ManagerOption

WithManagerCacheConfig sets the cache configuration for the Manager. This option can be combined with WithManagerCacheMetrics.

func WithManagerCacheMetrics ¶ added in v0.0.53

func WithManagerCacheMetrics(metrics CacheMetricsRecorder) ManagerOption

WithManagerCacheMetrics sets the metrics recorder for the cache. This option can be combined with WithManagerCacheConfig.

func WithManagerConnectionValidationTimeout ¶ added in v0.0.54

func WithManagerConnectionValidationTimeout(timeout time.Duration) ManagerOption

WithManagerConnectionValidationTimeout sets the timeout for validating connections to workload clusters. This is useful for high-latency environments where the default timeout (10s) may be insufficient.

The timeout applies to health checks performed when using GetKubeconfigForClusterValidated.

func WithManagerLogger ¶ added in v0.0.53

func WithManagerLogger(logger *slog.Logger) ManagerOption

WithManagerLogger sets the logger for the Manager.

type ManagerStats ¶ added in v0.0.67

type ManagerStats struct {
	// CacheSize is the current number of cached client entries.
	CacheSize int

	// CacheMaxEntries is the maximum cache capacity.
	CacheMaxEntries int

	// CacheTTL is the configured time-to-live for cache entries.
	CacheTTL time.Duration

	// Closed indicates whether the manager has been closed.
	Closed bool
}

ManagerStats provides statistics about the manager for monitoring and health checks.

type OAuthAuthMetricsRecorder ¶ added in v0.0.68

type OAuthAuthMetricsRecorder interface {
	// RecordOAuthDownstreamAuth records an OAuth downstream authentication attempt.
	// result should be one of: "success", "fallback", "failure", "no_token"
	RecordOAuthDownstreamAuth(ctx context.Context, result string)
}

OAuthClientProvider implements ClientProvider for OAuth downstream authentication. It creates per-user Kubernetes clients using the user's OAuth bearer token, ensuring all API operations are performed with the user's RBAC permissions.

Security Model ¶

When OAuth downstream is enabled, users authenticate to mcp-kubernetes via OAuth (e.g., through Dex or Google). Their OAuth token is then used directly for all Kubernetes API calls to both the Management Cluster and Workload Clusters.

This means:

The service account's RBAC permissions are NOT used for API operations
Each user can only perform actions they are authorized for via their own RBAC
Audit logs show the actual user identity, not the service account

The service account is only used for:

Pod lifecycle (mounting the projected token for potential fallback)
Network connectivity to the API server

Usage ¶

config := &OAuthClientProviderConfig{
    ClusterHost: "https://kubernetes.default.svc",
    CACertFile:  "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt",
    QPS:         50,
    Burst:       100,
    Timeout:     30 * time.Second,
}
provider, err := NewOAuthClientProvider(config)
if err != nil {
    return err
}
manager, err := NewManager(provider)
if err != nil {
    return err
}
defer manager.Close()

OAuthAuthMetricsRecorder provides an interface for recording OAuth authentication metrics. This is used by OAuthClientProvider to track authentication success/failure rates.

type OAuthClientProvider ¶ added in v0.0.68

type OAuthClientProvider struct {
	// contains filtered or unexported fields
}

func NewOAuthClientProvider ¶ added in v0.0.68

func NewOAuthClientProvider(config *OAuthClientProviderConfig) (*OAuthClientProvider, error)

NewOAuthClientProvider creates a new OAuthClientProvider from in-cluster configuration. It reads the cluster host and CA certificate from the standard in-cluster paths.

func NewOAuthClientProviderFromInCluster ¶ added in v0.0.68

func NewOAuthClientProviderFromInCluster() (*OAuthClientProvider, error)

NewOAuthClientProviderFromInCluster creates an OAuthClientProvider using the in-cluster configuration. This is the typical way to create the provider when running inside a Kubernetes pod.

func (*OAuthClientProvider) GetClientsForUser ¶ added in v0.0.68

func (p *OAuthClientProvider) GetClientsForUser(ctx context.Context, user *UserInfo) (kubernetes.Interface, dynamic.Interface, *rest.Config, error)

GetClientsForUser returns Kubernetes clients authenticated with the user's OAuth token. The user's OAuth token is extracted from context using the configured TokenExtractor and used as the bearer token for all API requests.

This method creates fresh clients for each call. The federation Manager handles caching of these clients per (cluster, user) pair.

Metrics ¶

If metrics are configured via SetMetrics, this method records authentication outcomes:

"success": Token extracted from context successfully
"fallback": Token obtained from user.Extra (testing/alternative flows)
"no_token": No token available in context or user.Extra
"failure": Client creation failed after token extraction

func (*OAuthClientProvider) SetMetrics ¶ added in v0.0.68

func (p *OAuthClientProvider) SetMetrics(metrics OAuthAuthMetricsRecorder)

SetMetrics sets the metrics recorder for tracking authentication success/failure. This should be called during initialization to enable metrics collection.

func (*OAuthClientProvider) SetTokenExtractor ¶ added in v0.0.68

func (p *OAuthClientProvider) SetTokenExtractor(extractor TokenExtractor)

SetTokenExtractor sets the token extractor function for the provider. This should be called after creating the provider to configure how tokens are extracted from context.

Security: Immutable After First Set ¶

This method can only be called once per provider instance. Subsequent calls will be ignored and a warning will be logged. This prevents runtime swapping of the token extractor which could lead to authentication bypass or confusion about which tokens are being used.

If you need to change the extractor, create a new OAuthClientProvider instance.

type OAuthClientProviderConfig ¶ added in v0.0.68

type OAuthClientProviderConfig struct {
	// ClusterHost is the Kubernetes API server URL (e.g., "https://kubernetes.default.svc").
	ClusterHost string

	// CACertFile is the path to the CA certificate for TLS verification.
	CACertFile string

	// QPS is the queries per second rate limit for the Kubernetes client.
	QPS float32

	// Burst is the burst limit for the Kubernetes client.
	Burst int

	// Timeout is the request timeout for API calls.
	Timeout time.Duration
}

OAuthClientProviderConfig contains configuration for creating an OAuthClientProvider.

func DefaultOAuthClientProviderConfig ¶ added in v0.0.68

func DefaultOAuthClientProviderConfig() *OAuthClientProviderConfig

DefaultOAuthClientProviderConfig returns a configuration with sensible defaults.

type StaticClientProvider ¶ added in v0.0.54

type StaticClientProvider struct {
	Clientset     kubernetes.Interface
	DynamicClient dynamic.Interface
	RestConfig    *rest.Config
}

StaticClientProvider is a simple ClientProvider that returns pre-configured clients. This is useful for testing and for scenarios where per-user client creation is not needed (e.g., when using service account authentication without OAuth downstream).

Note: When using StaticClientProvider, all users share the same client, so RBAC differentiation between users is not enforced at the client level. Use this only when appropriate for your security model.

func (*StaticClientProvider) GetClientsForUser ¶ added in v0.0.54

func (p *StaticClientProvider) GetClientsForUser(_ context.Context, _ *UserInfo) (kubernetes.Interface, dynamic.Interface, *rest.Config, error)

GetClientsForUser returns the static clients regardless of user. This implementation ignores the user parameter - all users get the same clients.

type TLSError ¶ added in v0.0.60

type TLSError struct {
	// ClusterName is the target cluster where TLS failed.
	ClusterName string

	// Host is the API server endpoint.
	Host string

	// Reason describes what went wrong in the TLS handshake.
	Reason string

	// Err is the underlying TLS error.
	Err error
}

TLSError provides detailed context about a TLS/certificate failure. This error indicates that the TLS handshake failed, which can happen due to:

Certificate signed by unknown authority
Expired certificate
Certificate hostname mismatch
TLS protocol version mismatch

Security Note ¶

TLS errors should NOT be bypassed by disabling certificate verification. Instead, ensure the CA certificate is properly configured in the kubeconfig.

Troubleshooting ¶

When encountering this error:

Verify the kubeconfig contains the correct CA certificate
Check if the cluster certificate has expired
Ensure the certificate SANs include the endpoint hostname/IP

func (*TLSError) Error ¶ added in v0.0.60

func (e *TLSError) Error() string

Error implements the error interface.

func (*TLSError) Is ¶ added in v0.0.60

func (e *TLSError) Is(target error) bool

Is implements custom error matching for errors.Is().

func (*TLSError) Unwrap ¶ added in v0.0.60

func (e *TLSError) Unwrap() error

Unwrap returns the underlying error.

func (*TLSError) UserFacingError ¶ added in v0.0.60

func (e *TLSError) UserFacingError() string

UserFacingError returns a message suitable for displaying to end users. This provides actionable guidance while maintaining security (not suggesting to bypass certificate verification).

type TokenExtractor ¶ added in v0.0.68

type TokenExtractor func(ctx context.Context) (string, bool)

TokenExtractor is a function type for extracting OAuth tokens from context. This allows for dependency injection of token extraction logic.

type UserInfo ¶

type UserInfo struct {
	// Email is the user's email address from the OAuth token's email claim.
	// This is used as the Impersonate-User header value.
	Email string

	// Groups contains the user's group memberships from OAuth claims.
	// These are passed via Impersonate-Group headers for RBAC evaluation.
	Groups []string

	// Extra contains additional claims from the OAuth token that should be
	// propagated to the Kubernetes API server via Impersonate-Extra headers.
	// Common examples include organization IDs, tenant identifiers, or custom claims.
	Extra map[string][]string
}

UserInfo contains the authenticated user's identity information extracted from the OAuth token. This information is used to configure Kubernetes user impersonation headers.

type ValidationError ¶

type ValidationError struct {
	Field  string
	Value  string // Sanitized value (may be truncated or anonymized)
	Reason string
	Err    error
}

ValidationError provides detailed context about a validation failure.

func (*ValidationError) Error ¶

func (e *ValidationError) Error() string

Error implements the error interface.

func (*ValidationError) Unwrap ¶

func (e *ValidationError) Unwrap() error

Unwrap returns the underlying error for use with errors.Is() and errors.As().

func (*ValidationError) UserFacingError ¶

func (e *ValidationError) UserFacingError() string

UserFacingError returns a sanitized error message safe for end users.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL