feature_engineering

package
v0.0.0-...-aafbcf9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 25, 2023 License: Apache-2.0 Imports: 5 Imported by: 0

Documentation

Index

Constants

View Source
const (
	FillMissingValuesByAverage        = "average"
	FillMissingValuesByMedian         = "median"
	FillMissingValuesBySpecifiedValue = "specified"
)
View Source
const (
	FilterByGreater        = "g"
	FilterByGreaterOrEqual = "ge"
	FilterByEqual          = "e"
	FilterByNotEqual       = "ne"
	FilterByLess           = "l"
	FilterByLessOrEqual    = "le"
)

Variables

This section is empty.

Functions

func EqualFrequencyBinning

func EqualFrequencyBinning(fileRows [][]string, binOpts map[string]BinOpt) ([][]string, error)

EqualFrequencyBinning equal frequency binning for selected feature list fileRows is sample content, the first row contains just names of features binOpts are binning options, map from feature name to BinOpt

func EqualFrequencyBinningForOne

func EqualFrequencyBinningForOne(fileRows [][]string, feature string, binNum int) ([][]string, error)

EqualFrequencyBinningForOne binning for one selected feature fileRows is sample content, the first row contains just names of features feature is selected feature to be processed binNum is target number of bins return new samples, value of selected feature is a discrete number among [0,1,...,bins-1]

func EqualWidthBinning

func EqualWidthBinning(fileRows [][]string, binOpts map[string]BinOpt) ([][]string, error)

EqualWidthBinning equal width binning for selected feature list fileRows is sample content, the first row contains just names of features binOpts are binning options, map from feature name to BinOpt

func EqualWidthBinningForOne

func EqualWidthBinningForOne(fileRows [][]string, feature string, binNum int) ([][]string, error)

EqualWidthBinningForOne binning for one selected feature fileRows is sample content, the first row contains just names of features feature is selected feature to be processed binNum is target number of bins return new samples, value of selected feature is a discrete number among [0,1,...,bins-1]

func FeatureSelect

func FeatureSelect(fileRows [][]string, features []string) [][]string

FeatureSelect filter samples with selected features fileRows is sample content, the first row contains just names of features features is target feature list to be selected

func FillByAverage

func FillByAverage(fileRows [][]string, feature string) ([][]string, error)

FillByAverage fills missing values of target features by average fileRows is sample content, the first row contains just names of features

func FillByMedian

func FillByMedian(fileRows [][]string, feature string) ([][]string, error)

FillByMedian fills missing values of target features by median fileRows is sample content, the first row contains just names of features

func FillBySpecified

func FillBySpecified(fileRows [][]string, feature, value string) ([][]string, error)

FillBySpecified fills missing values of target features by specified value fileRows is sample content, the first row contains just names of features

func FillMissingValues

func FillMissingValues(fileRows [][]string, fillOpts map[string]FillOpt) ([][]string, error)

FillMissingValues fills missing values of target features by average, median or specified value support filling several features at one time, return processed samples fileRows is sample content, the first row contains just names of features fillOpts are filling options for selected features

func FilterByThreshold

func FilterByThreshold(fileRows [][]string, filOpt map[string]FilterOpt) ([][]string, error)

FilterByThreshold filter samples by given threshold and method support several features filter at one time, return processed samples fileRows is sample content, the first row contains just names of features filOpt are filter options for selected features

func FilterByThresholdForOne

func FilterByThresholdForOne(fileRows [][]string, feature string, opt FilterOpt) ([][]string, error)

FilterByThresholdForOne filter by threshold for one feature

func PrintFileRows

func PrintFileRows(fileRows [][]string)

PrintFileRows print sample content in human-readable format

func ReplaceIntervalsBySpecifiedValues

func ReplaceIntervalsBySpecifiedValues(fileRows [][]string, repOpt map[string]ReplaceOpt) ([][]string, error)

ReplaceIntervalsBySpecifiedValues replace interval values by specified values support several features replacement at one time, return processed samples fileRows is sample content, the first row contains just names of features repOpt are replacement options for selected features

func ReplaceIntervalsForOne

func ReplaceIntervalsForOne(fileRows [][]string, feature string, opt ReplaceOpt) ([][]string, error)

ReplaceIntervalsForOne replace intervals by specified values for one feature

Types

type BinOpt

type BinOpt struct {
	BinNum int // target bins number
}

type FillOpt

type FillOpt struct {
	Method string // "average", "median" or "specified"
	Value  string // cannot be empty when Method is "specified"
}

type FilterOpt

type FilterOpt struct {
	Threshold    float64
	FilterMethod string // "g", "ge", "e", "ne", "l" or "le"
}

filter condition

type IntervalValue

type IntervalValue struct {
	Left  float64
	Right float64
	Value string
}

replace sample value in [left, right) with specified value

type ReplaceOpt

type ReplaceOpt struct {
	Intervals []IntervalValue
}

specify several intervals to replace

type StatisticsInfo

type StatisticsInfo struct {
	Maximum  float64  // largest value
	Minimum  float64  // smallest value
	Mean     float64  // average value
	Mode     []string // values appear most frequently, a set may have multiple modes, or no mode(all values appear with same frequency)
	Median   float64  // median value
	StandDev float64  // standard deviation
}

func FeatureStatistics

func FeatureStatistics(fileRows [][]string, feature string, valueIsStr bool) (StatisticsInfo, error)

FeatureStatistics computes stats info for given feature fileRows is sample content, the first row contains just names of features valueIsStr is true if sample value type of the feature is 'string'

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL