Documentation
¶
Overview ¶
Package example defines "Batch": a batch of examples; and "Features": the specification of the input features of a model.
Index ¶
- Constants
- func GetColumn(name string, dataspec *dataspec_pb.DataSpecification) *dataspec_pb.Column
- func NewFeatures(dataspec *dataspec_pb.DataSpecification, header *model_pb.AbstractModel, ...) (*Features, *FeatureConstructionMap, error)
- type Batch
- func (batch *Batch) Clear()
- func (batch *Batch) CopyFrom(src *Batch, beginIdx int, endIdx int)
- func (batch *Batch) FillMissing()
- func (batch *Batch) NumAllocatedExamples() int
- func (batch *Batch) SetCategorical(exampleIdx int, feature CategoricalFeatureID, value uint32)
- func (batch *Batch) SetCategoricalFromString(exampleIdx int, feature CategoricalFeatureID, rawValue string) error
- func (batch *Batch) SetFromFields(exampleIdx int, header []string, values []string) error
- func (batch *Batch) SetMissingCategorical(exampleIdx int, feature CategoricalFeatureID)
- func (batch *Batch) SetMissingNumerical(exampleIdx int, feature NumericalFeatureID)
- func (batch *Batch) SetNumerical(exampleIdx int, feature NumericalFeatureID, value float32)
- func (batch *Batch) ToStringDebug() string
- type CategoricalFeatureID
- type CategoricalSpec
- type CompatibilityType
- type FeatureConstructionMap
- type Features
- type NumericalFeatureID
Constants ¶
const OutOfVabulary = uint32(0)
OutOfVabulary (OOV) is the special values of unknown or too-rare categorical values.
Variables ¶
This section is empty.
Functions ¶
func GetColumn ¶
func GetColumn(name string, dataspec *dataspec_pb.DataSpecification) *dataspec_pb.Column
GetColumn gets the column spec from its name.
func NewFeatures ¶
func NewFeatures(dataspec *dataspec_pb.DataSpecification, header *model_pb.AbstractModel, compatibility CompatibilityType) (*Features, *FeatureConstructionMap, error)
NewFeatures converts a dataspec into a feature definition used by an engine.
Types ¶
type Batch ¶
type Batch struct {
// {Example major, feature minor} values for the unary feature values.
NumericalValues []float32
CategoricalValues []uint32
// contains filtered or unexported fields
}
Batch is a set of examples.
func NewBatch ¶
NewBatch creates a batch of examples. The example values are in a non-defined state: Because being used, the features values should be set ether with "FillMissing" or "Set*".
func (*Batch) Clear ¶
func (batch *Batch) Clear()
Clear clears the content of a batch. After a clear call, the feature values are in a non defined state i.e. in the same state as after "NewBatch".
func (*Batch) CopyFrom ¶
CopyFrom copies the content of a batch from another batch. Assumes both source batch has the exact same features (e.g. it is created by the same engine).
func (*Batch) FillMissing ¶
func (batch *Batch) FillMissing()
FillMissing sets all the feature values of all the examples as missing.
This method is equivalent to, but more efficient than, calling the "SetMissing*" methods for all the features and all the examples.
func (*Batch) NumAllocatedExamples ¶
NumAllocatedExamples is the number of allocated examples.
func (*Batch) SetCategorical ¶
func (batch *Batch) SetCategorical(exampleIdx int, feature CategoricalFeatureID, value uint32)
SetCategorical sets the value of a categorical feature as an integer.
func (*Batch) SetCategoricalFromString ¶
func (batch *Batch) SetCategoricalFromString(exampleIdx int, feature CategoricalFeatureID, rawValue string) error
SetCategoricalFromString sets the value of a categorical feature.
func (*Batch) SetFromFields ¶
SetFromFields sets all the fields of an example from a csv-like field and header. This method is slow and should not be used for speed-sensitive code.
Empty field and fields with the value "NA" are considered "missing values".
Example:
examples.SetFromFields(0, ["a","b","c"], ["0.5","UK","NA"])
func (*Batch) SetMissingCategorical ¶
func (batch *Batch) SetMissingCategorical(exampleIdx int, feature CategoricalFeatureID)
SetMissingCategorical sets a categorical feature value as missing.
func (*Batch) SetMissingNumerical ¶
func (batch *Batch) SetMissingNumerical(exampleIdx int, feature NumericalFeatureID)
SetMissingNumerical sets a numerical feature value as missing.
func (*Batch) SetNumerical ¶
func (batch *Batch) SetNumerical(exampleIdx int, feature NumericalFeatureID, value float32)
SetNumerical sets the value of a numerical feature.
func (*Batch) ToStringDebug ¶
ToStringDebug exports the content of the set of examples into a text-debug representation.
type CategoricalFeatureID ¶
type CategoricalFeatureID int
CategoricalFeatureID is the unique identifier of a categorical feature.
type CategoricalSpec ¶
type CategoricalSpec struct {
// NumUniqueValues of this feature. The feature value should be in [0, NumUniqueValues).
NumUniqueValues uint32
// contains filtered or unexported fields
}
CategoricalSpec is the meta-data about a categorical feature.
type CompatibilityType ¶
type CompatibilityType int32
CompatibilityType indicates how the model was trained, and it affects how features are consumed.
const ( // CompatibilityYggdrasil is the native way to consume examples and models model with Yggdrasil // Decision Forests. CompatibilityYggdrasil CompatibilityType = 0 // CompatibilityTensorFlowDecisionForests consumes models trained with TensorFlow Decision // Forests. // // Compatibility impact: Categorical and categorical-set columns feed as integer are offset by // 1. See "CATEGORICAL_INTEGER_OFFSET" in TensorFlow Decision Forests. CompatibilityTensorFlowDecisionForests CompatibilityType = 1 // CompatibilityAutoTFX consumes models trained with TensorFlow Decision // Forests. // // Compatibility impact: Categorical and categorical-set columns feed as integer are offset by // 1. See "CATEGORICAL_INTEGER_OFFSET" in TensorFlow Decision Forests. Missing numerical and // categorical string values are replaced respectively by -1 and "" (empty string). CompatibilityAutoTFX CompatibilityType = 2 // CompatibilityAutomatic detects automatically the compatibility of the model. CompatibilityAutomatic = 3 )
type FeatureConstructionMap ¶
type FeatureConstructionMap struct {
// Mapping between a column index (i.e. the index of the column in the
// dataspec) and a NumericalFeatureID.
NumericalFeatures map[int]NumericalFeatureID
// Mapping between a column index (in the dataspec) and a
// CategoricalFeatureID.
CategoricalFeatures map[int]CategoricalFeatureID
}
FeatureConstructionMap contains the mapping between the column index and the feature id. FeatureConstructionMap is only used during the model to engine compilation, and it is then discarded.
type Features ¶
type Features struct {
// NumericalFeatures is the mapping between numerical feature names and numerical feature ids.
// Indexed by "NumericalFeatureID".
NumericalFeatures map[string]NumericalFeatureID
// CategoricalFeatures is the mapping between categorical feature names and categorical feature
// ids. Indexed by "CategoricalFeatureID".
CategoricalFeatures map[string]CategoricalFeatureID
// MissingNumericalValues is the representation of a "missing value" for each of the numerial
// features.
// Note: Currently, serving only support global imputation of missing values
// during inference.
MissingNumericalValues []float32
// MissingCategoricalValues is the representation of a "missing value" for each of the categorical
// features.NumericalFeatureID
MissingCategoricalValues []uint32
// CategoricalSpec is the meta-data about the categorical features. Indexed by
// "CategoricalFeatureID".
CategoricalSpec []CategoricalSpec
// Compatibility indicates how the model is served.
Compatibility CompatibilityType
}
Features contains the definition of the input features of a model.
func (*Features) NumFeatures ¶
NumFeatures is the number of features.
func (*Features) OverrideMissingValuePlaceholders ¶
OverrideMissingValuePlaceholders specifies the values that will replace the missing numerical and categorical values when calling SetMissing* during inference.
Models are natively able to handle missing values. Overriding the missing values is a form of data pre-processing that should only be applied if such pre-processing is also applied during training.
type NumericalFeatureID ¶
type NumericalFeatureID int
NumericalFeatureID is the unique identifier of a numerical feature.