Documentation
¶
Overview ¶
Package feature implements the FEAT_* operators that run pre-filter to add derived columns to a record stream. Operators come in three flavors: per-row pure (LOG, SQRT), single-output transforms (BUCKETIZE), and global-pass operators that need a stats sweep before per-row write (FREQUENCY_ENCODE, TARGET_ENCODE). Multi-output operators (ONE_HOT, DATE_FEATURES) emit several columns from one feature spec.
The package owns its own registry; per-operator init() functions register their factory. The parent processing package consumes Apply() at the pre-filter pipeline stage. To avoid an import cycle, Apply takes a minimal Record interface that processing.Record satisfies.
Index ¶
- Constants
- func Apply(records []Record, features []*types.Feature, schema *encoding.Schema) error
- func IsStreamable(features []*types.Feature, schema *encoding.Schema) bool
- func RegisteredTypes() []types.FeatureType
- type Computer
- type Factory
- type Options
- type Output
- type Record
- type StreamingComputer
- type StreamingHandle
Constants ¶
const ( SplitTrain float64 = 0 SplitVal float64 = 1 SplitTest float64 = 2 )
SplitTrain, SplitVal, SplitTest are the encoded values for the output column. The skill documents the mapping; downstream filters and aggregations reference these constants by their numeric form.
Variables ¶
This section is empty.
Functions ¶
func Apply ¶
Apply runs every feature in features against the record set, mutating records in place to add derived columns. Failures return coded errors: PROCESSING_CONFIG for unknown types or factory errors, PROCESSING_RUNTIME for compute failures.
Apply trusts that descriptor.Predict has validated the request shape upstream; it does not re-check field existence, only operator dispatch and per-operator runtime errors.
func IsStreamable ¶
IsStreamable reports whether every feature's Computer implements StreamingComputer. Returns false on any unknown type or factory error so the buffered path can surface the canonical error.
func RegisteredTypes ¶
func RegisteredTypes() []types.FeatureType
RegisteredTypes returns all registered feature types. Order is map iteration order; callers that need determinism must sort the result.
Types ¶
type Computer ¶
Computer produces one or more derived columns from a record set. The returned map is keyed by output column name; each Output's Values slice has length equal to len(records) (Apply enforces this).
type Factory ¶
Factory constructs a Computer from a feature specification. The schema is the cohort schema before any feature output is added; Computers consult it to validate field types and resolve dictionaries.
type Options ¶
type Options struct{}
Options reserves room for the orchestrator to pass cross-feature context (e.g., prior split mask) once leakage detection ships.
type Output ¶
Output is one column's worth of derived values plus an aligned null mask. Values has one entry per input record; Nulls (when non-nil) has the same length and Nulls[i]==true means Values[i] is undefined for record i (the orchestrator writes a null marker instead of the numeric value).
type Record ¶
type Record interface {
// NumericValue returns the value and true if present and non-null.
NumericValue(name string) (float64, bool)
// StringValue returns the resolved string value for categorical fields.
StringValue(name string) (string, bool)
// Set writes a derived non-null value. Implementations clear any prior
// null marker and invalidate any cached views.
Set(name string, value float64)
// SetNull marks the named field null on this record.
SetNull(name string)
}
Record is the minimal contract feature operators need from the parent processing.Record. Defining it locally lets this subpackage stay free of the processing import.
type StreamingComputer ¶
type StreamingComputer interface {
PrePass(record Record, field string) error
Finalize() error
EmitRow(record Record, field string) (map[string]Output, error)
}
StreamingComputer is the streaming sibling of Computer. Operators that implement it can run on the streaming execution path; operators that do not force a buffered fallback even on otherwise stream-eligible requests.
Lifecycle: PrePass is called once per record on pass 1. Finalize closes the precompute sweep; per-row outputs become deterministic from this point. EmitRow is called once per record on pass 2 (in iteration order) and returns the derived column values for that record. Each Output's Values slice has length 1; if the row's output is null, Output.Nulls is length 1 with a true entry and the orchestrator writes a null marker instead of the value.
Stateless per-row operators (LOG, SQRT, ONE_HOT, BUCKETIZE-explicit, DATE_FEATURES) implement PrePass and Finalize as no-ops. Global-pass operators accumulate state in PrePass, materialize derived state in Finalize, and look it up in EmitRow.
type StreamingHandle ¶
type StreamingHandle struct {
Feature *types.Feature
Computer StreamingComputer
}
StreamingHandle pairs a feature spec with its constructed StreamingComputer. processStreaming holds a slice of these for the duration of a request; PrePass / Finalize / EmitRow drive each handle.
func BuildStreaming ¶
BuildStreaming constructs StreamingComputer instances for each feature in order. Caller should verify streamability via IsStreamable first; PROCESSING_INTERNAL is returned when an operator lacks streaming support, PROCESSING_CONFIG for unknown types or factory failures.