Documentation
¶
Overview ¶
Package stats provides standard statistic computations operating on the `tensor.Tensor` standard data representation.
Index ¶
- Variables
- func Binarize(in, threshold tensor.Tensor) tensor.Values
- func BinarizeOut(in, threshold tensor.Tensor, out tensor.Values) error
- func Clamp(in, minv, maxv tensor.Tensor) tensor.Values
- func ClampOut(in, minv, maxv tensor.Tensor, out tensor.Values) error
- func Count(in tensor.Tensor) tensor.Values
- func CountOut(in tensor.Tensor, out tensor.Values) error
- func CountOut64(in tensor.Tensor, out tensor.Values) *tensor.Float64
- func Describe(dir *tensorfs.Node, tsrs ...tensor.Tensor)
- func DescribeTable(dir *tensorfs.Node, dt *table.Table, columns ...string)
- func DescribeTableAll(dir *tensorfs.Node, dt *table.Table)
- func Final(in tensor.Tensor) tensor.Values
- func FinalOut(in tensor.Tensor, out tensor.Values) error
- func First(in tensor.Tensor) tensor.Values
- func FirstOut(in tensor.Tensor, out tensor.Values) error
- func GroupAll(dir *tensorfs.Node, tsrs ...tensor.Tensor) error
- func GroupDescribe(dir *tensorfs.Node, tsrs ...tensor.Tensor) error
- func GroupStats(dir *tensorfs.Node, stat Stats, tsrs ...tensor.Tensor) error
- func GroupStatsAsTable(dir *tensorfs.Node) *table.Table
- func GroupStatsAsTableNoStatName(dir *tensorfs.Node) *table.Table
- func Groups(dir *tensorfs.Node, tsrs ...tensor.Tensor) error
- func L1Norm(in tensor.Tensor) tensor.Values
- func L1NormOut(in tensor.Tensor, out tensor.Values) error
- func L2Norm(in tensor.Tensor) tensor.Values
- func L2NormOut(in tensor.Tensor, out tensor.Values) error
- func L2NormOut64(in tensor.Tensor, out tensor.Values) *tensor.Float64
- func Max(in tensor.Tensor) tensor.Values
- func MaxAbs(in tensor.Tensor) tensor.Values
- func MaxAbsOut(in tensor.Tensor, out tensor.Values) error
- func MaxOut(in tensor.Tensor, out tensor.Values) error
- func Mean(in tensor.Tensor) tensor.Values
- func MeanOut(in tensor.Tensor, out tensor.Values) error
- func MeanOut64(in tensor.Tensor, out tensor.Values) (mean64, count64 *tensor.Float64)
- func MeanTables(dts []*table.Table) *table.Table
- func Median(in tensor.Tensor) tensor.Values
- func MedianOut(in tensor.Tensor, out tensor.Values) error
- func Min(in tensor.Tensor) tensor.Values
- func MinAbs(in tensor.Tensor) tensor.Values
- func MinAbsOut(in tensor.Tensor, out tensor.Values) error
- func MinOut(in tensor.Tensor, out tensor.Values) error
- func Prod(in tensor.Tensor) tensor.Values
- func ProdOut(in tensor.Tensor, out tensor.Values) error
- func Q1(in tensor.Tensor) tensor.Values
- func Q1Out(in tensor.Tensor, out tensor.Values) error
- func Q3(in tensor.Tensor) tensor.Values
- func Q3Out(in tensor.Tensor, out tensor.Values) error
- func Quantiles(in, qs tensor.Tensor) tensor.Values
- func QuantilesOut(in, qs tensor.Tensor, out tensor.Values) error
- func Sem(in tensor.Tensor) tensor.Values
- func SemOut(in tensor.Tensor, out tensor.Values) error
- func SemPop(in tensor.Tensor) tensor.Values
- func SemPopOut(in tensor.Tensor, out tensor.Values) error
- func Std(in tensor.Tensor) tensor.Values
- func StdOut(in tensor.Tensor, out tensor.Values) error
- func StdOut64(in tensor.Tensor, out tensor.Values) (std64, mean64, count64 *tensor.Float64)
- func StdPop(in tensor.Tensor) tensor.Values
- func StdPopOut(in tensor.Tensor, out tensor.Values) error
- func StripPackage(name string) string
- func Sum(in tensor.Tensor) tensor.Values
- func SumOut(in tensor.Tensor, out tensor.Values) error
- func SumOut64(in tensor.Tensor, out tensor.Values) *tensor.Float64
- func SumSq(in tensor.Tensor) tensor.Values
- func SumSqDevOut64(in tensor.Tensor, out tensor.Values) (ssd64, mean64, count64 *tensor.Float64)
- func SumSqOut(in tensor.Tensor, out tensor.Values) error
- func SumSqOut64(in tensor.Tensor, out tensor.Values) *tensor.Float64
- func SumSqScaleOut64(in tensor.Tensor) (scale64, ss64 *tensor.Float64)
- func TableGroupDescribe(dir *tensorfs.Node, dt *table.Table, columns ...string) error
- func TableGroupStats(dir *tensorfs.Node, stat Stats, dt *table.Table, columns ...string) error
- func TableGroups(dir *tensorfs.Node, dt *table.Table, columns ...string) error
- func UnitNorm(a tensor.Tensor) tensor.Values
- func UnitNormOut(a tensor.Tensor, out tensor.Values) error
- func Var(in tensor.Tensor) tensor.Values
- func VarOut(in tensor.Tensor, out tensor.Values) error
- func VarOut64(in tensor.Tensor, out tensor.Values) (var64, mean64, count64 *tensor.Float64)
- func VarPop(in tensor.Tensor) tensor.Values
- func VarPopOut(in tensor.Tensor, out tensor.Values) error
- func VarPopOut64(in tensor.Tensor, out tensor.Values) (var64, mean64, count64 *tensor.Float64)
- func Vectorize2Out64(in tensor.Tensor, iniX, iniY float64, ...) (ox64, oy64 *tensor.Float64)
- func VectorizeOut64(in tensor.Tensor, out tensor.Values, ini float64, ...) *tensor.Float64
- func VectorizePreOut64(in tensor.Tensor, out tensor.Values, ini float64, pre *tensor.Float64, ...) *tensor.Float64
- func ZScore(a tensor.Tensor) tensor.Values
- func ZScoreOut(a tensor.Tensor, out tensor.Values) error
- type Stats
- func (s Stats) Call(in tensor.Tensor) tensor.Values
- func (i Stats) Desc() string
- func (s Stats) Func() StatsFunc
- func (s Stats) FuncName() string
- func (i Stats) Int64() int64
- func (i Stats) MarshalText() ([]byte, error)
- func (i *Stats) SetInt64(in int64)
- func (i *Stats) SetString(s string) error
- func (i Stats) String() string
- func (i *Stats) UnmarshalText(text []byte) error
- func (i Stats) Values() []enums.Enum
- type StatsFunc
- type StatsOutFunc
Constants ¶
This section is empty.
Variables ¶
var DescriptiveStats = []Stats{StatCount, StatMean, StatStd, StatSem, StatMin, StatQ1, StatMedian, StatQ3, StatMax}
DescriptiveStats are the standard descriptive stats used in Describe function. Cannot apply the final 3 sort-based stats to higher-dimensional data.
Functions ¶
func Binarize ¶
Binarize results in a binary-valued output by setting values >= the threshold to 1, else 0. threshold is treated as a scalar (first value used).
func BinarizeOut ¶
BinarizeOut results in a binary-valued output by setting values >= the threshold to 1, else 0. threshold is treated as a scalar (first value used).
func Clamp ¶
Clamp ensures that all values are within min, max limits, clamping values to those bounds if they exceed them. min and max args are treated as scalars (first value used).
func ClampOut ¶
ClampOut ensures that all values are within min, max limits, clamping values to those bounds if they exceed them. min and max args are treated as scalars (first value used).
func Count ¶
Count computes the count of non-NaN tensor values. See StatsFunc for general information.
func CountOut ¶
CountOut computes the count of non-NaN tensor values. See StatsOutFunc for general information.
func CountOut64 ¶
CountOut64 computes the count of non-NaN tensor values, and returns the Float64 output values for subsequent use.
func Describe ¶
Describe adds standard descriptive statistics for given tensor to the given tensorfs directory, adding a directory for each tensor and result tensor stats for each result. This is an easy way to provide a comprehensive description of data. The DescriptiveStats list is: Count, Mean, Std, Sem, Min, Q1, Median, Q3, Max
func DescribeTable ¶
DescribeTable runs Describe on given columns in table.
func DescribeTableAll ¶
DescribeTableAll runs Describe on all numeric columns in given table.
func Final ¶
Final returns the final tensor value(s), as a stats function, for the ending point in a naturally-ordered set of data. See StatsFunc for general information.
func FinalOut ¶
FinalOut returns the first tensor value(s), as a stats function, for the ending point in a naturally-ordered set of data. See StatsOutFunc for general information.
func First ¶
First returns the first tensor value(s), as a stats function, for the starting point in a naturally-ordered set of data. See StatsFunc for general information.
func FirstOut ¶
FirstOut returns the first tensor value(s), as a stats function, for the starting point in a naturally-ordered set of data. See StatsOutFunc for general information.
func GroupAll ¶
GroupAll copies all indexes from the first given tensor, into an "All/All" tensor in the given tensorfs, which can then be used with GroupStats to generate summary statistics across all the data. See Groups for more general documentation.
func GroupDescribe ¶
GroupDescribe runs standard descriptive statistics on given tensor data using GroupStats function, with DescriptiveStats list of stats.
func GroupStats ¶
GroupStats computes the given stats function on the unique grouped indexes produced by the Groups function, in the given tensorfs directory, applied to each of the tensors passed here. It creates a "Stats" subdirectory in given directory, with subdirectories with the name of each value tensor (if it does not yet exist), and then creates a subdirectory within that for the statistic name. Within that statistic directory, it creates a String tensor with the unique values of each source Groups tensor, and a aligned Float64 tensor with the statistics results for each such unique group value. See the README.md file for a diagram of the results.
func GroupStatsAsTable ¶
GroupStatsAsTable returns the results from GroupStats in given directory as a table.Table, using tensorfs.DirTable function.
func GroupStatsAsTableNoStatName ¶
GroupStatsAsTableNoStatName returns the results from GroupStats in given directory as a table.Table, using tensorfs.DirTable function. Column names are updated to not include the stat name, if there is only one statistic such that the resulting name will still be unique. Otherwise, column names are Value/Stat.
func Groups ¶
Groups generates indexes for each unique value in each of the given tensors. One can then use the resulting indexes for the tensor.Rows indexes to perform computations restricted to grouped subsets of data, as in the GroupStats function. See [GroupCombined] for function that makes a "Combined" Group that has a unique group for each _combination_ of the separate, independent groups created by this function. It creates subdirectories in a "Groups" directory within given tensorfs, for each tensor passed in here, using the metadata Name property for names (index if empty). Within each subdirectory there are int tensors for each unique 1D row-wise value of elements in the input tensor, named as the string representation of the value, where the int tensor contains a list of row-wise indexes corresponding to the source rows having that value. Note that these indexes are directly in terms of the underlying [Tensor] data rows, indirected through any existing indexes on the inputs, so that the results can be used directly as Indexes into the corresponding tensor data. Uses a stable sort on columns, so ordering of other dimensions is preserved.
func L1Norm ¶
L1Norm computes the sum of absolute-value-of tensor values. See StatsFunc for general information.
func L1NormOut ¶
L1NormOut computes the sum of absolute-value-of tensor values. See StatsFunc for general information.
func L2Norm ¶
L2Norm computes the square root of the sum of squares of tensor values, known as the L2 norm. See StatsFunc for general information.
func L2NormOut ¶
L2NormOut computes the square root of the sum of squares of tensor values, known as the L2 norm. See StatsOutFunc for general information.
func L2NormOut64 ¶
L2NormOut64 computes the square root of the sum of squares of tensor values, known as the L2 norm, and returns the Float64 output values for use in subsequent computations.
func MaxAbs ¶
MaxAbs computes the max of absolute-value-of tensor values. See StatsFunc for general information.
func MaxAbsOut ¶
MaxAbsOut computes the max of absolute-value-of tensor values. See StatsOutFunc for general information.
func MaxOut ¶
MaxOut computes the max of tensor values. See StatsOutFunc for general information.
func MeanOut ¶
MeanOut computes the mean of tensor values. See StatsOutFunc for general information.
func MeanOut64 ¶
MeanOut64 computes the mean of tensor values, and returns the Float64 output values for subsequent use.
func MeanTables ¶ added in v0.1.2
MeanTables returns a table.Table with the mean values across all float columns of the input tables, which must have the same columns but not necessarily the same number of rows.
func Median ¶
Median computes the median (50% quantile) of tensor values. See StatsFunc for general information.
func MedianOut ¶
MedianOut computes the median (50% quantile) of tensor values. See StatsFunc for general information.
func MinAbs ¶
MinAbs computes the min of absolute-value-of tensor values. See StatsFunc for general information.
func MinAbsOut ¶
MinAbsOut computes the min of absolute-value-of tensor values. See StatsOutFunc for general information.
func MinOut ¶
MinOut computes the min of tensor values. See StatsOutFunc for general information.
func ProdOut ¶
ProdOut computes the product of tensor values. See StatsOutFunc for general information.
func Q1 ¶
Q1 computes the first quantile (25%) of tensor values. See StatsFunc for general information.
func Q1Out ¶
Q1Out computes the first quantile (25%) of tensor values. See StatsFunc for general information.
func Q3 ¶
Q3 computes the third quantile (75%) of tensor values. See StatsFunc for general information.
func Q3Out ¶
Q3Out computes the third quantile (75%) of tensor values. See StatsFunc for general information.
func Quantiles ¶
Quantiles returns the given quantile(s) of non-NaN elements in given 1D tensor. Because sorting uses indexes, this only works for 1D case. If needed for a sub-space of values, that can be extracted through slicing and then used. Logs an error if not 1D. qs are 0-1 values, 0 = min, 1 = max, .5 = median, etc. Uses linear interpolation. Because this requires a sort, it is more efficient to get as many quantiles as needed in one pass.
func QuantilesOut ¶
QuantilesOut returns the given quantile(s) of non-NaN elements in given 1D tensor. Because sorting uses indexes, this only works for 1D case. If needed for a sub-space of values, that can be extracted through slicing and then used. Returns and logs an error if not 1D. qs are 0-1 values, 0 = min, 1 = max, .5 = median, etc. Uses linear interpolation. Because this requires a sort, it is more efficient to get as many quantiles as needed in one pass.
func Sem ¶
Sem computes the sample standard error of the mean of tensor values. Standard deviation [StdFunc] / sqrt(n). See also [SemPopFunc]. See StatsFunc for general information.
func SemOut ¶
SemOut computes the sample standard error of the mean of tensor values. Standard deviation [StdFunc] / sqrt(n). See also [SemPopFunc]. See StatsOutFunc for general information.
func SemPop ¶
SemPop computes the population standard error of the mean of tensor values. Standard deviation [StdPopFunc] / sqrt(n). See also [SemFunc]. See StatsFunc for general information.
func SemPopOut ¶
SemPopOut computes the population standard error of the mean of tensor values. Standard deviation [StdPopFunc] / sqrt(n). See also [SemFunc]. See StatsOutFunc for general information.
func Std ¶
Std computes the sample standard deviation of tensor values. Sqrt of variance from [VarFunc]. See also [StdPopFunc]. See StatsFunc for general information.
func StdOut ¶
StdOut computes the sample standard deviation of tensor values. Sqrt of variance from [VarFunc]. See also [StdPopFunc]. See StatsOutFunc for general information.
func StdOut64 ¶
StdOut64 computes the sample standard deviation of tensor values. and returns the Float64 output values for subsequent use.
func StdPop ¶
StdPop computes the population standard deviation of tensor values. Sqrt of variance from [VarPopFunc]. See also [StdFunc]. See StatsFunc for general information.
func StdPopOut ¶
StdPopOut computes the population standard deviation of tensor values. Sqrt of variance from [VarPopFunc]. See also [StdFunc]. See StatsOutFunc for general information.
func StripPackage ¶
StripPackage removes any package name from given string, used for naming based on FuncName() which could be custom or have a package prefix.
func SumOut ¶
SumOut computes the sum of tensor values. See StatsOutFunc for general information.
func SumOut64 ¶
SumOut64 computes the sum of tensor values, and returns the Float64 output values for subsequent use.
func SumSq ¶
SumSq computes the sum of squares of tensor values, See StatsFunc for general information.
func SumSqDevOut64 ¶
SumSqDevOut64 computes the sum of squared mean deviates of tensor values, and returns the Float64 output values for subsequent use.
func SumSqOut ¶
SumSqOut computes the sum of squares of tensor values, See StatsOutFunc for general information.
func SumSqOut64 ¶
SumSqOut64 computes the sum of squares of tensor values, and returns the Float64 output values for subsequent use.
func SumSqScaleOut64 ¶
SumSqScaleOut64 is a helper for sum-of-squares, returning scale and ss factors aggregated separately for better numerical stability, per BLAS. Returns the Float64 output values for subsequent use.
func TableGroupDescribe ¶
TableGroupDescribe runs GroupDescribe on the given columns from given table.Table.
func TableGroupStats ¶
TableGroupStats runs GroupStats using standard Stats on the given columns from given table.Table.
func TableGroups ¶
TableGroups runs Groups on the given columns from given table.Table.
func UnitNorm ¶
UnitNorm computes unit normalized values into given output tensor, subtracting the Min value and dividing by the Max of the remaining numbers.
func UnitNormOut ¶
UnitNormOut computes unit normalized values into given output tensor, subtracting the Min value and dividing by the Max of the remaining numbers.
func Var ¶
Var computes the sample variance of tensor values. Squared deviations from mean, divided by n-1. See also [VarPopFunc]. See StatsFunc for general information.
func VarOut ¶
VarOut computes the sample variance of tensor values. Squared deviations from mean, divided by n-1. See also [VarPopFunc]. See StatsOutFunc for general information.
func VarOut64 ¶
VarOut64 computes the sample variance of tensor values, and returns the Float64 output values for subsequent use.
func VarPop ¶
VarPop computes the population variance of tensor values. Squared deviations from mean, divided by n. See also [VarFunc]. See StatsFunc for general information.
func VarPopOut ¶
VarPopOut computes the population variance of tensor values. Squared deviations from mean, divided by n. See also [VarFunc]. See StatsOutFunc for general information.
func VarPopOut64 ¶
VarPopOut64 computes the population variance of tensor values. and returns the Float64 output values for subsequent use.
func Vectorize2Out64 ¶
func Vectorize2Out64(in tensor.Tensor, iniX, iniY float64, fun func(val, ox, oy float64) (float64, float64)) (ox64, oy64 *tensor.Float64)
Vectorize2Out64 is a version of VectorizeOut64 that separately aggregates two output values, x and y as tensor.Float64.
func VectorizeOut64 ¶
func VectorizeOut64(in tensor.Tensor, out tensor.Values, ini float64, fun func(val, agg float64) float64) *tensor.Float64
VectorizeOut64 is the general compute function for stats. This version makes a Float64 output tensor for aggregating and computing values, and then copies the results back to the original output. This allows stats functions to operate directly on integer valued inputs and produce sensible results. It returns the Float64 output tensor for further processing as needed.
func VectorizePreOut64 ¶
func VectorizePreOut64(in tensor.Tensor, out tensor.Values, ini float64, pre *tensor.Float64, fun func(val, pre, agg float64) float64) *tensor.Float64
VectorizePreOut64 is a version of VectorizeOut64 that takes an additional tensor.Float64 input of pre-computed values, e.g., the means of each output cell.
Types ¶
type Stats ¶
type Stats int32 //enums:enum -trim-prefix Stat
Stats is a list of different standard aggregation functions, which can be used to choose an aggregation function
const ( // count of number of elements. StatCount Stats = iota // sum of elements. StatSum // L1 Norm: sum of absolute values of elements. StatL1Norm // product of elements. StatProd // minimum value. StatMin // maximum value. StatMax // minimum of absolute values. StatMinAbs // maximum of absolute values. StatMaxAbs // mean value = sum / count. StatMean // sample variance (squared deviations from mean, divided by n-1). StatVar // sample standard deviation (sqrt of Var). StatStd // sample standard error of the mean (Std divided by sqrt(n)). StatSem // sum of squared values. StatSumSq // L2 Norm: square-root of sum-of-squares. StatL2Norm // population variance (squared diffs from mean, divided by n). StatVarPop // population standard deviation (sqrt of VarPop). StatStdPop // population standard error of the mean (StdPop divided by sqrt(n)). StatSemPop // middle value in sorted ordering. StatMedian // Q1 first quartile = 25%ile value = .25 quantile value. StatQ1 // Q3 third quartile = 75%ile value = .75 quantile value. StatQ3 // first item in the set of data: for data with a natural ordering. StatFirst // final item in the set of data: for data with a natural ordering. StatFinal )
const StatsN Stats = 22
StatsN is the highest valid value for type Stats, plus one.
func StatsValues ¶
func StatsValues() []Stats
StatsValues returns all possible values for the type Stats.
func (Stats) Call ¶
Call calls this statistic function on given tensors. returning output as a newly created tensor.
func (Stats) FuncName ¶
FuncName returns the package-qualified function name to use in tensor.Call to call this function.
func (Stats) MarshalText ¶
MarshalText implements the encoding.TextMarshaler interface.
func (*Stats) SetString ¶
SetString sets the Stats value from its string representation, and returns an error if the string is invalid.
func (*Stats) UnmarshalText ¶
UnmarshalText implements the encoding.TextUnmarshaler interface.
type StatsFunc ¶
StatsFunc is the function signature for a stats function that returns a new output vector. This can be less efficient for repeated computations where the output can be re-used: see StatsOutFunc. But this version can be directly chained with other function calls. Function is computed over the outermost row dimension and the output is the shape of the remaining inner cells (a scalar for 1D inputs). Use tensor.As1D, tensor.NewRowCellsView, tensor.Cells1D etc to reshape and reslice the data as needed. All stats functions skip over NaN's, as a missing value. Stats functions cannot be computed in parallel, e.g., using VectorizeThreaded or GPU, due to shared writing to the same output values. Special implementations are required if that is needed.