Documentation
¶
Overview ¶
Package regression hosts the REG_* operators that fit a regression model against the filtered record set. Phase 0 lands the API surface only — every registered operator returns PROCESSING_REGRESSION_NOT_IMPLEMENTED via Fit. Phases 1–5 fill in closed-form OLS, regularization, GLM (IRLS), Bayesian linear, and the Resample / Selection modifier wrappers.
The package owns its own registry; the parent processing package consumes the Fit entry point through processing.FitRegressions without an import of regression-engine internals. To avoid an import cycle with processing.Record, this subpackage stays independent of concrete Record types — Phase 1 will introduce a narrow Record interface the same way processing/feature does.
Index ¶
- Constants
- func Fit(specs []*types.RegressionSpec, schema *encoding.Schema) ([]*types.RegressionResult, error)
- func FitBuffered(specs []*types.RegressionSpec, schema *encoding.Schema, records []Record) ([]*types.RegressionResult, error)
- func LeverageForRow(fit *AttributeFit, x []float64) float64
- func RegisteredTypes() []types.RegressionType
- func ValidateRegression(spec *types.RegressionSpec) error
- type AttributeFit
- type BufferedEngine
- type Engine
- type Factory
- type Record
- type StreamingEngine
Constants ¶
const InterceptKey = "(intercept)"
InterceptKey is the map key used for the synthesized intercept entry in RegressionResult.Coefficients / StdErrors / PValues. Picked to be human-readable and impossible to collide with a real field name (no schema field carries parentheses).
Variables ¶
This section is empty.
Functions ¶
func Fit ¶
func Fit(specs []*types.RegressionSpec, schema *encoding.Schema) ([]*types.RegressionResult, error)
Fit drives Build then invokes Fit on every engine in order. Returns (results, nil) on success; the first per-spec error short-circuits the loop and propagates to the caller.
Phase 0 contract: every spec returns PROCESSING_REGRESSION_NOT_IMPLEMENTED from Fit. Phase 1 retires this path for the buffered orchestrator: callers now use FitBuffered to pass in the filtered record slice. Fit() is retained for callers that have no records (legacy / unit tests); engines that need rows return PROCESSING_REGRESSION_INSUFFICIENT_DATA from Fit() since their accumulator is empty.
func FitBuffered ¶
func FitBuffered(specs []*types.RegressionSpec, schema *encoding.Schema, records []Record) ([]*types.RegressionResult, error)
FitBuffered is the Phase 1 buffered-orchestrator entry point. It builds engines, then for each engine routes through FitBuffered with the filtered record slice. Engines that implement BufferedEngine consume the records; legacy not-implemented engines ignore the slice and surface PROCESSING_REGRESSION_NOT_IMPLEMENTED via Fit().
The records slice is shared across every spec — every engine sees the same filtered set, so per-spec listwise null filtering happens inside the engine (callers don't pre-filter per spec).
func LeverageForRow ¶
func LeverageForRow(fit *AttributeFit, x []float64) float64
LeverageForRow computes hᵢᵢ for a single row given a centered-Gram inverse and the streaming predictor means. The closed-form identity for OLS with an intercept and centered predictors is
hᵢᵢ = 1/n + (xᵢ − μ_x)ᵀ · M2_xx⁻¹ · (xᵢ − μ_x)
Returns 0 when fit is nil or fit.GramInverse is nil — callers should have validated upstream. Rows with any null predictor produce a zero leverage; the attribute layer decides whether to emit zero or flag missingness via the schema.
func RegisteredTypes ¶
func RegisteredTypes() []types.RegressionType
RegisteredTypes returns every registered regression type. Order is map-iteration order; callers that need determinism sort the result.
func ValidateRegression ¶
func ValidateRegression(spec *types.RegressionSpec) error
ValidateRegression performs semantic spec validation that the engine itself needs but that the lighter descriptor.validateRegressions (header-only, no execution dependencies) cannot reach. Today this is the Penalty / Alpha / L1Ratio coupling introduced in Phase 2.
Returns nil on a well-formed spec; otherwise a PROCESSING_CONFIG CodedError naming the offending field. The engine factory calls this after the schema check and before constructing the accumulator so invalid combos never reach the streaming path.
Validation rules:
- Penalty=="l1"|"l2" : require Alpha > 0 and L1Ratio == 0 (callers who want elastic-net should set Penalty="elasticnet").
- Penalty=="elasticnet": require Alpha > 0 and 0 < L1Ratio < 1. L1Ratio == 0 → user should use l2; L1Ratio == 1 → use l1.
- Penalty=="" (unpenalized): Alpha / L1Ratio must be zero (we reject silent-typo specs like Penalty="" with Alpha=0.5 — the intent is ambiguous).
- Unknown Penalty values → reject. Descriptor's enum check normally catches this but ValidateRegression is also reachable from hand-built specs in tests.
Types ¶
type AttributeFit ¶
type AttributeFit struct {
// Coefficients are the fitted slopes in the same order as
// PredictorOrder. Length equals len(PredictorOrder).
Coefficients []float64
// Intercept is the fitted intercept term β₀.
Intercept float64
// PredictorOrder echoes the spec's Predictors slice. Provided so the
// attribute's pass-2 loop reads predictors in the same order the
// solver used. Owned by the caller.
PredictorOrder []string
// MeanX is the streaming mean of each predictor over the rows that
// contributed to the fit (listwise-deleted predictor/target nulls).
// Length matches PredictorOrder. Populated for every fit.
MeanX []float64
// NObs is the number of rows that contributed to the fit (n in the
// hat-matrix identity 1/n + …).
NObs int
// GramInverse is M2_xx⁻¹ — the inverse of the centered predictor
// Gram matrix used to compute leverage. Populated only when the
// caller requests it (ATTR_REG_LEVERAGE); nil otherwise so the
// regularized solvers can keep their post-fit cost free of an
// additional inverse. Stored as a p×p gonum SymDense.
GramInverse *mat.SymDense
}
AttributeFit is the minimal package of post-fit quantities the per-row regression attributes (ATTR_REG_FITTED / ATTR_REG_RESIDUAL / ATTR_REG_LEVERAGE) need to emit their second-pass per-row values.
The full RegressionResult / olsSolveResult bundle is intentionally not returned: attribute callers do not surface regression result blocks; they only need the coefficient vector (for ŷᵢ / residuals) and, in the leverage case, the centered-Gram inverse plus the streaming predictor means (for the standard hat-matrix identity
hᵢᵢ = 1/n + (xᵢ − μ_x)ᵀ · M2_xx⁻¹ · (xᵢ − μ_x)).
The struct is per-attribute state captured at finalize time; the attribute then iterates records a second time and computes the per-row scalar.
func FitForAttribute ¶
func FitForAttribute( spec *types.RegressionSpec, records []Record, needLeverage bool, ) (*AttributeFit, error)
FitForAttribute runs the OLS streaming accumulator over the supplied records, then dispatches the same per-Penalty solver the engine uses to produce coefficients + intercept. When needLeverage is true the helper additionally produces the centered-Gram inverse from the (unpenalized) Cholesky factorisation.
The helper enforces the per-attribute spec rules upstream of the solver: ATTR_REG_LEVERAGE accepts only unpenalized OLS (Penalty==""). Callers gate on that constraint before invoking with needLeverage=true; this function returns PROCESSING_CONFIG if the combination ever escapes upstream validation.
Listwise deletion mirrors olsEngine.UpdateRow: rows whose target or any predictor is null contribute nothing. The returned NObs reflects the rows that actually folded into the accumulator.
type BufferedEngine ¶
type BufferedEngine interface {
Engine
FitBuffered(records []Record) (*types.RegressionResult, error)
}
BufferedEngine is the Phase 1 entry point for buffered-orchestrator regression fits. Implementations consume an in-memory record slice and emit the populated result. Phase 0 stub engines ignore the slice and return PROCESSING_REGRESSION_NOT_IMPLEMENTED; the unpenalized olsEngine folds every record through its Welford accumulator.
type Engine ¶
type Engine interface {
// Fit returns the per-spec result. Phase 0 stubs always return a
// PROCESSING_REGRESSION_NOT_IMPLEMENTED CodedError; later phases
// populate a RegressionResult on success. A non-nil error indicates
// the orchestrator must not partially include this slot in
// Response.Regressions.
Fit() (*types.RegressionResult, error)
}
Engine is the per-spec regression fit contract. Phase 1+ engines implement Fit to consume the filtered record set and emit a populated RegressionResult. Phase 0 stubs return PROCESSING_REGRESSION_NOT_IMPLEMENTED.
func Build ¶
Build returns an Engine for each spec. Unknown types surface PROCESSING_CONFIG; per-spec factory errors propagate untouched. This is the orchestrator's entry point — both streaming and buffered paths invoke Build first, then route the returned engines through the appropriate execution interface.
type Factory ¶
Factory constructs an Engine for a concrete RegressionSpec against a schema. Per-operator init() functions register their factory in the shared registry; Lookup returns the factory for a given type.
type Record ¶
type Record interface {
// NumericValue returns the value and true if present and non-null.
// Returns (0, false) when the named field is null or absent on the
// record.
NumericValue(name string) (float64, bool)
}
Record is the minimal contract the regression engines need from the parent processing.Record. Defining it locally keeps this subpackage free of the processing import and mirrors the pattern used by processing/feature.
processing.Record satisfies this interface; the orchestrator passes records through unchanged.
type StreamingEngine ¶
type StreamingEngine interface {
UpdateRow(rec Record) error
Finalize() (*types.RegressionResult, error)
}
StreamingEngine is the Phase 1 entry point for single-pass orchestrator regression fits. Implementations consume records one at a time via UpdateRow, then close the fit via Finalize. The orchestrator wraps the same record stream that drives streamable aggregators.
Only the unpenalized OLS engine implements this interface today. Phase 2 (regularized OLS), Phase 4 (Bayes), and Phases 5+ (modifiers) will expand the set.
func BuildStreaming ¶
func BuildStreaming(specs []*types.RegressionSpec, schema *encoding.Schema) ([]StreamingEngine, error)
BuildStreaming constructs StreamingEngine handles for the single-pass orchestrator. Specs whose engines do not yet implement StreamingEngine are rejected — callers are expected to gate on CanStreamRequest / RegressionSpec.Streamable() before reaching this entry point.
Returns nil, nil for an empty spec slice so the orchestrator can call it unconditionally on streaming requests.