df

package module
v0.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 6, 2025 License: Apache-2.0 Imports: 15 Imported by: 0

README

df - A Dataframe Package

Go Report Card godoc

Overview

What Makes df Different?

Dataframes are commonly-used objects that are used to hold and manipulate data for analysis. Conceptually, a dataframe consists of a set of columns. Columns, in turn, are arrays of values which are of a common type and length. Originally, implementations of dataframes such as in R and the pandas Python package were designed to hold the data in memory, though these have been extended to big-data cases.

How is df different? The package df specifies interfaces for dataframes and Columns. The package is agnostic as to the mechanisms handling the underlying data.

With this approach, the user can pull a sample of a large table, experiment with the data, do EDA, etc., in a fast, efficient manner. When desired, the same Go code can be run over the entire table.

The df package consists of a main package, df, and two sub-packages, df/mem and df/sql. The main package:

  • defines the DF and Column interfaces.
  • implements core aspects of those interfaces.
  • provides a parser to evaluate expressions.
  • handles file and DB I/O.

Packages df/mem and df/sql implement the full DF and Column interfaces for in-memory data and SQL databases, respectively. The distinction between df/mem and df/sql is not the source of the data. Package mem/DF dataframes can be read from and save to a database, for example. The distinction is where the calculations and manipulations are performed. The df/mem package does this work in memory, while the df/sql performs it in the database.

Functionality

What do you need to be able to do with a dataframe? Well, you'll want to

  • Create and save them.
  • Manipulate the columns such as creating new columns based on the existing ones.
  • Subset, sort, summarize and join the data.

To this end,

  • With df you can read/write files (such as CSV) and SQL tables.

  • df has a parser for evaluating expressions to create new columns. The parser allows flexible specification of expressions that return a column result. The parser will work on any type that satisfies the DF interface*.

    Parse(df, "y := exp(yhat) / (1.0 + exp(yhat))")
    Parse(df, "r := if(k==1,a,b)")
    Parse(df, "xNorm := x / global(sum(x))")
    Parse(df, "zCat := cat(z)")
    
  • The interface specification includes methods such as Sort(), By() and Join().

*Note: the specific implementation must also provide to the parser implementations of functions, such as "sum". The df/mem and df/sql packages offer identical function sets. See the Parse section of a list of supported functions.

Extensible

The package may be extended in several directions. One can

  • Add new functions to the parser.
  • Add additional database types can be added to the sql package. Currently, ClickHouse and Postgres are supported. Adding support for the new DB type requires modifying the Dialect struct. The sql package would not need to be modified.
  • Build a completely new implementation of the DF and Column interfaces.

Package Details

df

The df package defines the DF and Column interfaces in two steps: a core (DC, CC, respectively) and full interface (DF, Column). The core interface defines those methods which are independent of the details of the data architecture (e.g. DropColumns() from DF, Name() method for a Column). The df package provides structs that implement the core DF and Column interfaces (DFcore, ColCore).

The package also provides a parser, procedures for accessing files and tables, and additional structures required by the package.

df/mem

The df/mem package implements the DF and Column interfaces for in-memory objects.

df/sql

The package df/sql implements the DF and Column interfaces for SQL databases. It relies on the methods of Dialect to handle the specifics of any particular database type.

A basic design philosophy of this package is that the storage mechanism of the data doesn't matter. A complication is that, though two database packages may use SQL, the details are likely to differ. The Dialect struct and its methods abstract these differences away. Those methods handle the differences between databases, hence each DB must specifically be handled there. Currently, Clickhouse and Postgres are supported. Dialect uses the standard Go sql package connector. All communication with databases occurs through Dialect.

Data Types Four data types are supported for column elements:

  • float
  • int
  • string
  • date

There is one additional type, "categorical", which is a mapping of the values of a source column (of type int, string or date) into int.

Note that the df/mem and df/sql packages strongly type data. One cannot add a float and an int, for example.

Docs For details, see the docs.

Documentation

Overview

The package df is an implementation of dataframes. The central idea here is that the dataframes are defined as an interface which is independent of the implementation of the data-handling details.

The package df defines:

  • The dataframe and column interfaces (DF, Column).
  • Implements core aspects of these.
  • Provides a parser to handle Column-valued expressions.
  • Provides for file and database IO.

Along with df, there are two sub-packages implementing DF and Column:

  • df/mem. In-memory dataframes,
  • df/sql. SQL-database dataframes. The current implementation covers ClickHouse and Postgres databases.

See the [documentation]: https://invertedv.github.io/df for details.

Index

Constants

This section is empty.

Variables

View Source
var DateFormats = []string{"20060102", "1/2/2006", "01/02/2006", "Jan 2, 2006", "January 2, 2006",
	"Jan 2 2006", "January 2 2006", "2006-01-02", "01/02/06", "1/2/06"}

DateFormats is list of available formats for dates.

Functions

func Has

func Has[C comparable](needle C, haystack []C) bool

func Parse

func Parse(df DF, expr string) error

Parse parses the expression expr and appends the result to df. Expressions have the form:

<result> := <expression>.

A list of functions available is in the documentation.

func Position

func Position[C comparable](needle C, haystack []C) int

func PrettyPrint

func PrettyPrint(header []string, cols ...any) string

PrettyPrint returns a string where the elements of cols are aligned under the header. cols are expected to be a slice of either float64, int, string or time.Time

func RandomLetters

func RandomLetters(length int) string

RandomLetters generates a string of length "length" by randomly choosing from a-z

func StringSlice

func StringSlice(header string, inVal any) []string

StringSlice converts inVal to a slice of strings, the first element is the header. inVal is expected to be a slice of float64, int, string or time.Time

func ToDataType

func ToDataType(x any, dt DataTypes) (any, bool)

ToDataType converts x to the dt data type.

Types

type CC

type CC interface {
	Core() *ColCore              // Core returns itself.
	CategoryMap() CategoryMap    // CategoryMap returns a map of original value to category value.  Not nil only for dt=DTcategorical.
	DataType() DataTypes         // DataType returns the type of the column.
	Dependencies() []string      // Dependencies returns a list of columns required to calculate this column, if this is a calculated column.
	Dialect() *Dialect           // Dialect returns the Dialect object. A Dialect object is required if there is DB interaction.
	Name() string                // Name returns the column's name.
	Parent() DF                  // Parent returns the DF to which the column belongs.
	Rename(newName string) error // Rename renames the column.
}

The CC interface defines the methods of ColCore. These methods are invariant to the data that underlies the column.

type CategoryMap

type CategoryMap map[any]int

CategoryMap maps the raw value of a categorical column to the category level

func (CategoryMap) Max

func (cm CategoryMap) Max() int

func (CategoryMap) Min

func (cm CategoryMap) Min() int

func (CategoryMap) String

func (cm CategoryMap) String() string

type ColCore

type ColCore struct {
	// contains filtered or unexported fields
}

ColCore implements the CC interface.

func NewColCore

func NewColCore(opts ...ColOpt) (*ColCore, error)

func (*ColCore) CategoryMap

func (c *ColCore) CategoryMap() CategoryMap

func (*ColCore) Copy

func (c *ColCore) Copy() *ColCore

func (*ColCore) Core

func (c *ColCore) Core() *ColCore

Core returns itself. We eed a method to return itself since DFCore struct will need these methods

func (*ColCore) DataType

func (c *ColCore) DataType() DataTypes

func (*ColCore) Dependencies

func (c *ColCore) Dependencies() []string

func (*ColCore) Dialect

func (c *ColCore) Dialect() *Dialect

func (*ColCore) Name

func (c *ColCore) Name() string

func (*ColCore) Parent

func (c *ColCore) Parent() DF

func (*ColCore) RawType

func (c *ColCore) RawType() DataTypes

func (*ColCore) Rename

func (c *ColCore) Rename(newName string) error

type ColOpt

type ColOpt func(c CC) error

ColOpt functions are used to set ColCore options

func ColCatMap

func ColCatMap(cm CategoryMap) ColOpt

func ColDataType

func ColDataType(dt DataTypes) ColOpt

func ColDialect

func ColDialect(dlct *Dialect) ColOpt

func ColName

func ColName(name string) ColOpt

func ColParent

func ColParent(df DF) ColOpt

func ColRawType

func ColRawType(raw DataTypes) ColOpt

type Column

type Column interface {

	// Core Methods
	CC

	// AllRows iterates through the rows of the column.  It returns the row # and the value of the column at that row.
	// The row value return is a slice, []any, of length 1.  This was done to be consistent with
	// the AllRows() function of DF which also returns []any.
	AllRows() iter.Seq2[int, []any]

	// Copy returns a copy of the column.
	Copy() Column

	// Data returns the contents of the column.  Column implementations that are not stored in memory (e.g. as in a database)
	//  will have to fetch the data when this method is called.
	Data() *Vector

	// Len is the length of the column.
	Len() int

	// Stringer.  This is expected to be a summary of the column.
	String() string
}

The Column interface defines the methods that columns must have.

func RunDFfn

func RunDFfn(fn Fn, df DF, inputs []Column) (Column, error)

RunDFfn runs a parser function

fn     - function to run
df     - data frame providing data
inputs - inputs to fn. If the inputs belong to a DF.

type DC

type DC interface {
	// AllColumns returns an iterator across the columns.
	AllColumns() iter.Seq[Column]

	// AppendColumns appends col to the DF.
	AppendColumn(col Column, replace bool) error

	// Column returns the column colName.  Returns nil if the column doesn't exist.
	Column(colName string) Column

	// ColNames returns the names of all the columns.
	ColumnNames() []string

	// ColumnTypes returns the types of columns.  If cols is nil, returns the types for all columns.
	ColumnTypes(cols ...string) ([]DataTypes, error)

	// Core returns itself.
	Core() *DFcore

	// Dialect returns the Dialect object for DB access.
	Dialect() *Dialect

	// DropColumns drops colNames from the DF.
	DropColumns(colNames ...string) error

	// Fns returns a slice of functions that operate on columns.
	Fns() Fns

	// HasColumns returns true if the DF has all cols.
	HasColumns(cols ...string) bool

	// KeepColumns subsets DF to colsToKeep
	KeepColumns(colsToKeep ...string) error

	// sourceDF returns the source DF for this DF if this DF is a derivative (e.g. a Table).
	SourceDF() *DFcore
}

type DF

type DF interface {
	// Core methods
	DC

	// AllRows iterates through the rows of the column.  It returns the row # and the values of DF that row.
	AllRows() iter.Seq2[int, []any]

	// AppendDF appends df
	AppendDF(df DF) (DF, error)

	// By creates a new DF that groups the source DF by the columns listed in groupBy and calculates fns on the groups.
	By(groupBy string, fns ...string) (DF, error)

	// Categorical creates a categorical column
	//	colName    - name of the source column
	//	catMap     - optionally supply a category map of source value -> category level
	//	fuzz       - if a source column value has counts < fuzz, then it is put in the 'other' category.
	//	defaultVal - optional source column value for the 'other' category.
	//	levels     - slice of source values to make categories from
	Categorical(colName string, catMap CategoryMap, fuzz int, defaultVal any, levels []any) (Column, error)

	Copy() DF

	// Interp interpolates the columns (xIfield,yfield) at xsField points.
	//   iDF      - input iterator (e.g. Column or DF) that yields the points to interpolate at
	//   xSfield  - column name of x values in source DF
	//   xIfield  - name of x values in iDF
	//   yfield   - column name of y values in source DF
	//   outField - column name of interpolated y's in return DF
	//
	// The output DF has two columns: xIfield, outField.
	Interp(iDF HasIter, xSfield, xIfield, yfield, outField string) (DF, error)

	// Join inner joins the df to the source DF on the joinOn fields
	//   df       - DF to join
	//   joinOn   - comma-separated list of fields to join on.
	Join(df HasIter, joinOn string) (DF, error)

	// RowCount returns # of rows in df
	RowCount() int

	// SetParent sets the Parent field of all the columns in the source DF
	SetParent() error

	// Sort sorts the source DF on sortCols
	//   ascending - if true, sorts ascending
	//   sortCols      - sortCols is a comma-separated list of fields on which to sort.
	Sort(ascending bool, sortCols string) error

	// String is expected to produce a summary of the source DF.
	String() string

	// Table returns a table based on cols.
	//   cols - comma-separated list of column names for the table.
	// The return is expected to include the columns "count" and "rate"
	Table(cols string) (DF, error)

	// Where returns a DF subset according to condition.
	Where(condition string) (DF, error)
}

type DFcore

type DFcore struct {
	// contains filtered or unexported fields
}

DFcore implements DC.

func NewDFcore

func NewDFcore(cols []Column, opts ...DFopt) (df *DFcore, err error)

func (*DFcore) AllColumns

func (df *DFcore) AllColumns() iter.Seq[Column]

func (*DFcore) AppendColumn

func (df *DFcore) AppendColumn(col Column, replace bool) error

func (*DFcore) Column

func (df *DFcore) Column(colName string) Column

func (*DFcore) ColumnNames

func (df *DFcore) ColumnNames() []string

func (*DFcore) ColumnTypes

func (df *DFcore) ColumnTypes(colNames ...string) ([]DataTypes, error)

func (*DFcore) Copy

func (df *DFcore) Copy() *DFcore

func (*DFcore) Core

func (df *DFcore) Core() *DFcore

func (*DFcore) Dialect

func (df *DFcore) Dialect() *Dialect

func (*DFcore) DropColumns

func (df *DFcore) DropColumns(colNames ...string) error

func (*DFcore) Fns

func (df *DFcore) Fns() Fns

func (*DFcore) HasColumns

func (df *DFcore) HasColumns(cols ...string) bool

func (*DFcore) KeepColumns

func (df *DFcore) KeepColumns(colNames ...string) error

func (*DFcore) SourceDF

func (df *DFcore) SourceDF() *DFcore

type DFopt

type DFopt func(df DC) error

DFopt functions are used to set DFcore options

func DFappendFn

func DFappendFn(f Fn) DFopt

func DFdialect

func DFdialect(d *Dialect) DFopt

func DFsetFns

func DFsetFns(f Fns) DFopt

func DFsetSourceDF

func DFsetSourceDF(source DC) DFopt

type DataTypes

type DataTypes uint8

DataTypes are the types of data that the package supports for Column elements

const (
	DTfloat DataTypes = 0 + iota
	DTint
	DTstring
	DTdate
	DTcategorical
	DTunknown // keep as last entry, OK to put new entries before
)

Values of DataTypes

func DTFromString

func DTFromString(nm string) DataTypes

DTFromString returns the DataTypes value as given by nm e.g. Input "DTdate", output 3. Fail behavior is to return DTunknown

func GetKind

func GetKind(fn reflect.Type) DataTypes

GetKind maps reflect.Type into d.DataType

func WhatAmI

func WhatAmI(val any) DataTypes

WhatAmI returns the type of val.

func (DataTypes) String

func (i DataTypes) String() string

type Dialect

type Dialect struct {
	// contains filtered or unexported fields
}

Dialect manages interactions with DB's.

func NewDialect

func NewDialect(dialect string, db *sql.DB, opts ...DialectOpt) (*Dialect, error)

NewDialect creates a *Dialect to manage DB access.

func (*Dialect) BufSize

func (d *Dialect) BufSize() int

BufSize returns the buffer size for Insert Values queries.

func (*Dialect) Case

func (d *Dialect) Case(whens, vals []string) (string, error)

Case creates a CASE statement.

whens - slice of conditions
vals  - slice of the value to set the result to if condition is true

func (*Dialect) CastField

func (d *Dialect) CastField(fieldName string, toDT DataTypes) (sqlStr string, err error)

CastField casts fieldName to type toDT.

func (*Dialect) CastFloat

func (d *Dialect) CastFloat() bool

CastFloat says whether floats need to be cast as such. Postgress will return "NUMERIC" for calculated fields which the connector loads as strings

func (*Dialect) Close

func (d *Dialect) Close() error

func (*Dialect) Convert

func (d *Dialect) Convert(val any) any

Convert converts val to the corresponding datatype used by df. assign assigns the indx vector of v to be val

func (*Dialect) Create

func (d *Dialect) Create(tableName, orderBy string, fields []string, types []DataTypes, overwrite, temporary bool, options ...string) error

Create creates a table.

tableName  - name of the table to create
orderBy    - comma-separated list of fields to form the key (order)
fields     - field names
types      - field types
overwrite  - if true, overwrite existing table
temporary  - create a temp table
options    - are in key:value format and are meant to replace placeholders in create.txt

func (*Dialect) DB

func (d *Dialect) DB() *sql.DB

func (*Dialect) DialectName

func (d *Dialect) DialectName() string

func (*Dialect) DropTable

func (d *Dialect) DropTable(tableName string) error

func (*Dialect) Exists

func (d *Dialect) Exists(tableName string) bool

Exists returns true if tableName exists on the db.

func (*Dialect) Functions

func (d *Dialect) Functions() Fmap

Functions returns a map of functions for the parser.

func (*Dialect) Global

func (d *Dialect) Global(sourceSQL, colSQL string) string

Global takes SQL that normally is a scalar return (e.g. count(*), avg(x)) and surrounds it with SQL to return that value for every row of a query.

func (*Dialect) Insert

func (d *Dialect) Insert(tableName, makeQuery, fields string) error

Insert executes an insert query

func (*Dialect) InsertValues

func (d *Dialect) InsertValues(tableName string, values []byte) error

InsertValues inserts values into tableName

func (*Dialect) Interp

func (d *Dialect) Interp(sourceSQL, interpSQL, xSfield, xIfield, yField, outField string) string

Interp executes a query to interpolate values

func (*Dialect) IterSave

func (d *Dialect) IterSave(tableName string, df HasIter) error

IterSave saves the data represented by df into tableName

func (*Dialect) Join

func (d *Dialect) Join(leftSQL, rightSQL string, leftFields, rightFields, joinFields []string) string

Join creates an inner JOIN query.

leftSQL - SQL for left side of join
rightSQL - SQL for right side of join
leftFields - fields to keep from leftSQL
rightFields - fields to keep from rightSQL
joinField - fields to join on

func (*Dialect) Load

func (d *Dialect) Load(qry string) (memData []*Vector, fieldNames []string, fieldTypes []DataTypes, e error)

Load loads qry from a DB into a slice of *Vector.

memData    - returned data
fieldNames - field names of columns
fieldTypes - field types

func (*Dialect) Quantile

func (d *Dialect) Quantile(col string, q float64) string

func (*Dialect) Quote

func (d *Dialect) Quote() string

func (*Dialect) RowCount

func (d *Dialect) RowCount(qry string) (int, error)

func (*Dialect) Rows

func (d *Dialect) Rows(qry string) (rows *sql.Rows, row2Read []any, fieldNames []string, e error)

Rows returns a row reader for qry.

rows       - row reader
row2Read   - a slice with the appropriate types to read the rows.
fieldNames - names of the columns

func (*Dialect) Save

func (d *Dialect) Save(tableName, orderBy string, overwrite, temp bool, toSave HasIter, options ...string) error

Save saves an Iter object to a database.

tableName - name of table to create.
orderBy   - comma-separated list of fields to use as key (order).
overwrite - if true, replace any existing table.
temp      - if true, create a temp table.
toSave    - data to save.
options   - options for CREATE.

func (*Dialect) Seq

func (d *Dialect) Seq(n int) string

Seq returns a query that creates a table with column "seq" whose int values run from 0 to n-1.

func (*Dialect) ToName

func (d *Dialect) ToName(fieldName string) string

ToName converts the raw field name to what's need for a interaction with the database. Specifically, Postgres requires quotes around field names that have uppercase letters

func (*Dialect) ToString

func (d *Dialect) ToString(val any) string

ToString returns a string version of val that can be placed into SQL

func (*Dialect) Types

func (d *Dialect) Types(qry string) (fieldNames []string, fieldTypes []DataTypes, row2read []any, err error)

Types returns info needed to read the data generated by qry.

fieldNames - names of columns qry returns.
fieldTypes - column types returned by qry.
row2Read   - correctly typed row to read for Scan.

func (*Dialect) Union

func (d *Dialect) Union(table1, table2 string, colNames ...string) (string, error)

Union returns a union query between two tables (queries).

func (*Dialect) WithName

func (d *Dialect) WithName() string

WithName returns a random name for use as WITH names, etc.

type DialectOpt

type DialectOpt func(d *Dialect) error

DialectOpt functions are used to set Dialect options

func DialectBuffSize

func DialectBuffSize(bufMB int) DialectOpt

DialectBuffSize sets the buffer size (in MB) for accumulating inserts. Default is 1GB.

func DialectDefaultDate

func DialectDefaultDate(year, mon, day int) DialectOpt

DialectDefaultDate sets the default date to use if a date is null. Default is 1/1/1960.

func DialectDefaultFloat

func DialectDefaultFloat(deflt float64) DialectOpt

DialectDefaultFloat sets the default float to use if an int is null. Default is MaxFloat64.

func DialectDefaultInt

func DialectDefaultInt(deflt int) DialectOpt

DialectDefaultInt sets the default int to use if an int is null. Default is MaxInt.

func DialectDefaultString

func DialectDefaultString(deflt string) DialectOpt

DialectDefaultString sets the default string to use if an int is null. Default is "".

type FileOpt

type FileOpt func(f *Files) error

FileOpt functions are used to set Files options

func FileDateFormat

func FileDateFormat(format string) FileOpt

FileDateFormat sets the format for dates in the file. Default is 20060102.

func FileDefaultDate

func FileDefaultDate(year, mon, day int) FileOpt

FileDefaultDate sets the value to use for fields that fail to convert to date if strict=false. Default is 1/1/1960.

func FileDefaultFloat

func FileDefaultFloat(deflt float64) FileOpt

FileDefaultFloat sets the value to use for fields that fail to convert to float if strict=false. Default is MaxFloat64.

func FileDefaultInt

func FileDefaultInt(deflt int) FileOpt

FileDefaultInt sets the value to use for fields that fail to convert to integer if strict=false. Default is MaxInt.

func FileDefaultString

func FileDefaultString(deflt string) FileOpt

FileDefaultString sets the value to use for fields that fail to convert to string if strict=false. Default is "".

func FileEOL

func FileEOL(eol byte) FileOpt

FileEOL sets the end-of-line character. The default is \n.

func FileFieldNames

func FileFieldNames(fieldNames []string) FileOpt

FileFieldNames sets the field names for the file -- needed if the file has no header.

func FileFieldTypes

func FileFieldTypes(fieldTypes []DataTypes) FileOpt

FileFieldTypes sets the field types for the file--can be used instead of peeking at the file & guessing.

func FileFieldWidths

func FileFieldWidths(fieldWidths []int) FileOpt

FileFieldWidths sets field widths for flat files

func FileFloatFormat

func FileFloatFormat(format string) FileOpt

FileFloatFormat sets the format for writing floats. Default is %.2f.

func FileHeader

func FileHeader(hasHeader bool) FileOpt

FileHeader sets true if file has a header. Default is true.

func FilePeek

func FilePeek(linesToPeek int) FileOpt

FilePeek sets the # of lines to examine to determine data types. Default value of 0 will examine the entire file.

func FileSep

func FileSep(sep byte) FileOpt

FileSep sets the field separator. Default is a comma.

func FileStrict

func FileStrict(strict bool) FileOpt

FileStrict sets the action when a field fails to convert to its expected type.

If true, then an error results.
If false, the default value is substituted.

Default: false

func FileStringDelim

func FileStringDelim(delim byte) FileOpt

FilesStringDelim sets the string delimiter. The default is ".

type Files

type Files struct {
	// contains filtered or unexported fields
}

Files manages interactions with files.

func NewFiles

func NewFiles(opts ...FileOpt) (*Files, error)

NewFiles creates a *Files struct for reading/writing files.

func (*Files) Close

func (f *Files) Close() error

func (*Files) Create

func (f *Files) Create(fileName string) error

Create creates fileName on the file system.

func (*Files) FieldNames

func (f *Files) FieldNames() []string

func (*Files) FieldTypes

func (f *Files) FieldTypes() []DataTypes

func (*Files) FieldWidths

func (f *Files) FieldWidths() []int

func (*Files) Load

func (f *Files) Load() ([]*Vector, error)

Load loads the data into a slice of *Vector.

func (*Files) Open

func (f *Files) Open(fileName string) error

Open opens fileName for reading/writing. It examines the file for consistency with the parameters (e.g has header). If needed, it determines and sets field names and types.

func (*Files) Save

func (f *Files) Save(fileName string, df HasIter) error

Save saves df out to fileName. The file must be created first.

type Fmap

type Fmap map[string]*FnSpec

Fmap maps the function name to its spec

func LoadFunctions

func LoadFunctions(fns string) Fmap

LoadFunctions loads functions from a string which is an embedded file. LoadFunctions expects functions to be separated by "\n" Within each line there are 6 fields separated by colons. The fields are:

function name
function spec
inputs
outputs
return type (C = column, S = scalar)
varying inputs (Y = yes).

Inputs are sets of types with in braces separated by commas.

{int,int},{float,float}

specifies the function takes two parameters which can be either {int,int} or {float,float}.

Corresponding to each set of inputs is an output type. In the above example, if the function always returns a float, the output would be:

float,float.

Legal types are float, int, string and date. Categorical inputs are ints.

If there is no input parameter, leave the field empty as in:

::

type Fn

type Fn func(info bool, df DF, inputs ...Column) *FnReturn

Fn is the function signature for functions called by the parser.

info    - if info == true, then the function is not run but returns *FnReturn with info fields filled in (Name, Output, Inputs, Varying, IsScalar)
df     - DF providing data for function (required only if info=false).
inputs - inputs to the function (required only if info=false).

type FnReturn

type FnReturn struct {
	Value Column // return value of function

	Name string // name of function
	// An element of Inputs is a slice of data types that the function takes as inputs.  For instance,
	//   {DTfloat,DTint}
	// means that the function takes 2 inputs - the first float, the second int.  And
	//   {DTfloat,DTint},{DTfloat,DTfloat}
	// means that the function takes 2 inputs - either float,int or float,float.
	Inputs [][]DataTypes

	// Output types corresponding to the input slices.
	Output []DataTypes

	Varying bool // if true, the number of inputs varies.

	IsScalar bool // if true, the function reduces a column to a scalar (e.g. sum, mean)

	Err error
}

FnReturn is the return type for parser functions

type FnSpec

type FnSpec struct {
	// Name is the name of the function that the parser will recognize in user statements.
	Name string

	// FnDetail gives the specifics of the function.
	// For df/sql, this is the SQL that is run.
	// For df/mem, this is the name of the Go function to call.
	FnDetail string

	// Inputs is a slice that lists all valid combinations of inputs.
	Inputs [][]DataTypes

	// Outputs is a slice that lists the outputs corresponding to each element of Inputs.
	Outputs []DataTypes

	// IsScalar is true if the function reduces a column to a scalar (e.g. mean, sum)
	IsScalar bool

	// Varying is true if the number of inputs can vary.
	Varying bool

	// This is a slice of Go functions to call, corresponding to the elements of inputs/outputs.
	// Not used for df/sql.
	Fns []any
}

FnSpec specifies a function that the parser will have access to.

type Fns

type Fns []Fn

func (Fns) Get

func (fs Fns) Get(fnName string) Fn

type HasIter

type HasIter interface {
	AllRows() iter.Seq2[int, []any]
}

The HasIter interface restricts to types that have an iterator through the rows of the data. Save only requires an iterator to move through the rows

type HasMQdlct

type HasMQdlct interface {
	MakeQuery(colNames ...string) string
	Dialect() *Dialect
}

type HasDQdlct restricts to types that can access a DB

type Scalar

type Scalar struct {
	*ColCore
	// contains filtered or unexported fields
}

Scalar implements Column for scalars.

func NewScalar

func NewScalar(val any, opts ...ColOpt) (*Scalar, error)

func (*Scalar) AllRows

func (s *Scalar) AllRows() iter.Seq2[int, []any]

func (*Scalar) AppendRows

func (s *Scalar) AppendRows(col Column) (Column, error)

func (*Scalar) Copy

func (s *Scalar) Copy() Column

func (*Scalar) Core

func (s *Scalar) Core() *ColCore

func (*Scalar) Data

func (s *Scalar) Data() *Vector

func (*Scalar) Len

func (s *Scalar) Len() int

func (*Scalar) Rename

func (s *Scalar) Rename(newName string) error

func (*Scalar) Replace

func (s *Scalar) Replace(ind, repl Column) (Column, error)

func (*Scalar) String

func (s *Scalar) String() string

type Vector

type Vector struct {
	// contains filtered or unexported fields
}

Vector is the return type for Column data.

func MakeVector

func MakeVector(dt DataTypes, n int) *Vector

MakeVector returns a *Vector with data of type dt and length n.

func NewVector

func NewVector(data any, dt DataTypes) (*Vector, error)

NewVector creates a new *Vector from data, checking/converting that it is of type dt.

func (*Vector) AllRows

func (v *Vector) AllRows() iter.Seq2[int, []any]

AllRows returns an iterator that move through the data. It returns a slice rather than a row so that it's compatible with the DF iterator

func (*Vector) Append

func (v *Vector) Append(data ...any) error

Append appends data (as a slice) to the vector.

func (*Vector) AppendVector

func (v *Vector) AppendVector(vAdd *Vector) error

AppendVector appends a vector.

func (*Vector) AsAny

func (v *Vector) AsAny() any

AsAny returns the data as an any variable.

func (*Vector) AsDate

func (v *Vector) AsDate() ([]time.Time, error)

AsDate returns the data as a time.Time slice. It converts to date, if needed & possible.

func (*Vector) AsFloat

func (v *Vector) AsFloat() ([]float64, error)

AsFloat returns the data as a []float64 slice. It converts to float64, if needed & possible.

func (*Vector) AsInt

func (v *Vector) AsInt() ([]int, error)

AsInt returns the data as a []int slice. It converts to int, if needed & possible.

func (*Vector) AsString

func (v *Vector) AsString() ([]string, error)

AsString returns the data as a string, converting if needed.

func (*Vector) Copy

func (v *Vector) Copy() *Vector

Copy copies the *Vector

func (*Vector) Element

func (v *Vector) Element(indx int) any

Element returns the indx'th element of Vector. It returns nil if indx is out of bounds if v.Len() > 1. If v.Len() = 1, then returns the 0th element. This is needed for the parser when we have an op like "x/2" and we don't want to append a vector of 2's.

func (*Vector) ElementDate

func (v *Vector) ElementDate(indx int) (*time.Time, error)

ElementDate returns the indx'th element as a date, converting the value, if needed & possible.

func (*Vector) ElementFloat

func (v *Vector) ElementFloat(indx int) (*float64, error)

ElementFloat returns the indx'th element as a float64, converting the value, if needed & possible.

func (*Vector) ElementInt

func (v *Vector) ElementInt(indx int) (*int, error)

ElementInt returns the indx'th element as a int, converting the value, if needed & possible.

func (*Vector) ElementString

func (v *Vector) ElementString(indx int) (*string, error)

ElementString returns the indx'th element as a string.

func (*Vector) Len

func (v *Vector) Len() int

Len is the length of the *Vector

func (*Vector) Less

func (v *Vector) Less(i, j int) bool

Less returns true if element i < element j

func (*Vector) SetAny

func (v *Vector) SetAny(val any, indx int)

SetAny sets the indx'th element to val. Does no error checking.

func (*Vector) SetDate

func (v *Vector) SetDate(val time.Time, indx int) error

SetDate sets the indx'th element to val. Does not attempt conversion.

func (*Vector) SetFloat

func (v *Vector) SetFloat(val float64, indx int) error

SetFloat sets the indx'th element to val. Does not attempt conversion.

func (*Vector) SetInt

func (v *Vector) SetInt(val, indx int) error

SetInt sets the indx'th element to val. Does not attempt conversion.

func (*Vector) SetString

func (v *Vector) SetString(val string, indx int) error

SetString sets the indx'th element to val. Does not attempt conversion.

func (*Vector) String

func (v *Vector) String() string

func (*Vector) Swap

func (v *Vector) Swap(i, j int)

Swap swaps the ith and jth element of *Vector

func (*Vector) VectorType

func (v *Vector) VectorType() DataTypes

func (*Vector) Where

func (v *Vector) Where(indic *Vector) *Vector

Where creates a new *Vector with elements from the original *Vector in which indic is greater than 0. indic must be type DTint.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL