feature

package
v0.10.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 21, 2026 License: MIT Imports: 10 Imported by: 0

Documentation

Overview

Package feature implements the FEAT_* operators that run pre-filter to add derived columns to a record stream. Operators come in three flavors: per-row pure (LOG, SQRT), single-output transforms (BUCKETIZE), and global-pass operators that need a stats sweep before per-row write (FREQUENCY_ENCODE, TARGET_ENCODE). Multi-output operators (ONE_HOT, DATE_FEATURES) emit several columns from one feature spec.

The package owns its own registry; per-operator init() functions register their factory. The parent processing package consumes Apply() at the pre-filter pipeline stage. To avoid an import cycle, Apply takes a minimal Record interface that processing.Record satisfies.

Index

Constants

View Source
const (
	SplitTrain float64 = 0
	SplitVal   float64 = 1
	SplitTest  float64 = 2
)

SplitTrain, SplitVal, SplitTest are the encoded values for the output column. The skill documents the mapping; downstream filters and aggregations reference these constants by their numeric form.

View Source
const MaxPolyDegree = 10

MaxPolyDegree caps the degree of FEAT_POLY expansion. Naive x^d overflows quickly when x is large or untransformed; callers needing higher degrees should standardize or center their inputs first and, for serious work, use an orthogonal basis (not in scope for v1).

Variables

This section is empty.

Functions

func Apply

func Apply(records []Record, features []*types.Feature, schema *encoding.Schema) error

Apply runs every feature in features against the record set, mutating records in place to add derived columns. Equivalent to ApplyWithExt(records, features, schema, nil).

func ApplyWithExt added in v0.7.0

func ApplyWithExt(records []Record, features []*types.Feature, schema *encoding.Schema, extFactories map[types.FeatureType]Factory) error

ApplyWithExt runs every feature in features against the record set, honouring an optional embedder overlay. Failures return coded errors: PROCESSING_CONFIG for unknown types or factory errors, PROCESSING_RUNTIME for compute failures.

ApplyWithExt trusts that descriptor.Predict has validated the request shape upstream; it does not re-check field existence, only operator dispatch and per-operator runtime errors.

func IsStreamable

func IsStreamable(features []*types.Feature, schema *encoding.Schema) bool

IsStreamable reports whether every feature's Computer implements StreamingComputer. Equivalent to IsStreamableWithExt(features, schema, nil).

func IsStreamableWithExt added in v0.7.0

func IsStreamableWithExt(features []*types.Feature, schema *encoding.Schema, extFactories map[types.FeatureType]Factory) bool

IsStreamableWithExt reports whether every feature's Computer implements StreamingComputer, honouring an optional embedder overlay. Returns false on any unknown type or factory error so the buffered path can surface the canonical error.

func RegisteredTypes

func RegisteredTypes() []types.FeatureType

RegisteredTypes returns all registered feature types. Order is map iteration order; callers that need determinism must sort the result.

Types

type Computer

type Computer interface {
	Compute(records []Record, field string) (map[string]Output, error)
}

Computer produces one or more derived columns from a record set. The returned map is keyed by output column name; each Output's Values slice has length equal to len(records) (Apply enforces this).

type Factory

type Factory func(feat *types.Feature, schema *encoding.Schema) (Computer, error)

Factory constructs a Computer from a feature specification. The schema is the cohort schema before any feature output is added; Computers consult it to validate field types and resolve dictionaries.

func Lookup

func Lookup(t types.FeatureType) (Factory, bool)

Lookup returns the factory for a feature type, if registered.

type Options

type Options struct{}

Options reserves room for the orchestrator to pass cross-feature context (e.g., prior split mask) once leakage detection ships.

type Output

type Output struct {
	Values []float64
	Nulls  []bool
}

Output is one column's worth of derived values plus an aligned null mask. Values has one entry per input record; Nulls (when non-nil) has the same length and Nulls[i]==true means Values[i] is undefined for record i (the orchestrator writes a null marker instead of the numeric value).

type Record

type Record interface {
	// NumericValue returns the value and true if present and non-null.
	NumericValue(name string) (float64, bool)

	// StringValue returns the resolved string value for categorical fields.
	StringValue(name string) (string, bool)

	// Set writes a derived non-null value. Implementations clear any prior
	// null marker and invalidate any cached views.
	Set(name string, value float64)

	// SetNull marks the named field null on this record.
	SetNull(name string)
}

Record is the minimal contract feature operators need from the parent processing.Record. Defining it locally lets this subpackage stay free of the processing import.

type StreamingComputer

type StreamingComputer interface {
	PrePass(record Record, field string) error
	Finalize() error
	EmitRow(record Record, field string) (map[string]Output, error)
}

StreamingComputer is the streaming sibling of Computer. Operators that implement it can run on the streaming execution path; operators that do not force a buffered fallback even on otherwise stream-eligible requests.

Lifecycle: PrePass is called once per record on pass 1. Finalize closes the precompute sweep; per-row outputs become deterministic from this point. EmitRow is called once per record on pass 2 (in iteration order) and returns the derived column values for that record. Each Output's Values slice has length 1; if the row's output is null, Output.Nulls is length 1 with a true entry and the orchestrator writes a null marker instead of the value.

Stateless per-row operators (LOG, SQRT, ONE_HOT, BUCKETIZE-explicit, DATE_FEATURES) implement PrePass and Finalize as no-ops. Global-pass operators accumulate state in PrePass, materialize derived state in Finalize, and look it up in EmitRow.

type StreamingHandle

type StreamingHandle struct {
	Feature  *types.Feature
	Computer StreamingComputer
}

StreamingHandle pairs a feature spec with its constructed StreamingComputer. processStreaming holds a slice of these for the duration of a request; PrePass / Finalize / EmitRow drive each handle.

func BuildStreaming

func BuildStreaming(features []*types.Feature, schema *encoding.Schema) ([]StreamingHandle, error)

BuildStreaming is the no-overlay variant of BuildStreamingWithExt.

func BuildStreamingWithExt added in v0.7.0

func BuildStreamingWithExt(features []*types.Feature, schema *encoding.Schema, extFactories map[types.FeatureType]Factory) ([]StreamingHandle, error)

BuildStreamingWithExt constructs StreamingComputer instances for each feature in order, honouring an optional embedder overlay. Caller should verify streamability via IsStreamable[WithExt] first; PROCESSING_INTERNAL is returned when an operator lacks streaming support, PROCESSING_CONFIG for unknown types or factory failures.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL