Documentation ¶
Index ¶
Constants ¶
View Source
const ( // DefaultMinReplicas is the minimal amount of replicas for a service. DefaultMinReplicas = 1 // DefaultMaxReplicas is the amount of replicas a service will auto-scale up to. DefaultMaxReplicas = 5 DefaultZeroDuration = 3 * time.Minute // DefaultScalingFactor is the defining proportion for the scaling increments. DefaultScalingFactor = 10 ScaleTypeRPS ScaleType = "rps" ScaleTypeCapacity ScaleType = "capacity" // MinScaleLabel label indicating min scale for a Inference MinScaleLabel = "ai.tensorchord.scale.min" // MaxScaleLabel label indicating max scale for a Inference MaxScaleLabel = "ai.tensorchord.scale.max" // ScalingFactorLabel label indicates the scaling factor for a Inference ScalingFactorLabel = "ai.tensorchord.scale.factor" // TargetLoadLabel label indicates the target load for a Inference TargetLoadLabel = "ai.tensorchord.scale.target" // ZeroDurationLabel label indicates the zero duration for a Inference ZeroDurationLabel = "ai.tensorchord.scale.zero-duration" // ScaleTypeLabel label indicates the scale type for a Inference ScaleTypeLabel = "ai.tensorchord.scale.type" FrameworkLabel = "ai.tensorchord.framework" )
Variables ¶
This section is empty.
Functions ¶
Types ¶
type FunctionScaleResult ¶
FunctionScaleResult holds the result of scaling from zero
type InferenceScaler ¶
type InferenceScaler struct {
// contains filtered or unexported fields
}
InferenceScaler scales from zero
func NewInferenceScaler ¶
InferenceScaler create a new scaler with the specified ScalingConfig
func (*InferenceScaler) Scale ¶
func (s *InferenceScaler) Scale(ctx context.Context, namespace, inferenceName string) FunctionScaleResult
Scale scales a function from zero replicas to 1 or the value set in the minimum replicas metadata
type ServiceQuery ¶
type ServiceQuery interface { GetReplicas(service, namespace string) (response ServiceQueryResponse, err error) SetReplicas(service, namespace string, count uint64) error }
ServiceQuery provides interface for replica querying/setting
type ServiceQueryResponse ¶
type ServiceQueryResponse struct { Framework string TargetLoad uint64 ZeroDuration time.Duration Replicas uint64 MaxReplicas uint64 MinReplicas uint64 ScalingFactor uint64 AvailableReplicas uint64 Annotations map[string]string }
ServiceQueryResponse response from querying a function status
func AsServerQueryResponse ¶
func AsServerQueryResponse(inf *types.InferenceDeployment) (*ServiceQueryResponse, error)
Click to show internal directories.
Click to hide internal directories.