Documentation
¶
Overview ¶
The package df is an implementation of dataframes. The central idea here is that the dataframes are defined as an interface which is independent of the implementation of the data-handling details.
The package df defines:
- The dataframe and column interfaces (DF, Column).
- Implements core aspects of these.
- Provides a parser to handle Column-valued expressions.
- Provides for file and database IO.
Along with df, there are two sub-packages implementing DF and Column:
- df/mem. In-memory dataframes,
- df/sql. SQL-database dataframes. The current implementation covers ClickHouse and Postgres databases.
See the [documentation]: https://invertedv.github.io/df for details.
Index ¶
- Variables
- func Has[C comparable](needle C, haystack []C) bool
- func Parse(df DF, expr string) error
- func Position[C comparable](needle C, haystack []C) int
- func PrettyPrint(header []string, cols ...any) string
- func RandomLetters(length int) string
- func StringSlice(header string, inVal any) []string
- func ToDataType(x any, dt DataTypes) (any, bool)
- type CC
- type CategoryMap
- type ColCore
- func (c *ColCore) CategoryMap() CategoryMap
- func (c *ColCore) Copy() *ColCore
- func (c *ColCore) Core() *ColCore
- func (c *ColCore) DataType() DataTypes
- func (c *ColCore) Dependencies() []string
- func (c *ColCore) Dialect() *Dialect
- func (c *ColCore) Name() string
- func (c *ColCore) Parent() DF
- func (c *ColCore) RawType() DataTypes
- func (c *ColCore) Rename(newName string) error
- type ColOpt
- type Column
- type DC
- type DF
- type DFcore
- func (df *DFcore) AllColumns() iter.Seq[Column]
- func (df *DFcore) AppendColumn(col Column, replace bool) error
- func (df *DFcore) Column(colName string) Column
- func (df *DFcore) ColumnNames() []string
- func (df *DFcore) ColumnTypes(colNames ...string) ([]DataTypes, error)
- func (df *DFcore) Copy() *DFcore
- func (df *DFcore) Core() *DFcore
- func (df *DFcore) Dialect() *Dialect
- func (df *DFcore) DropColumns(colNames ...string) error
- func (df *DFcore) Fns() Fns
- func (df *DFcore) HasColumns(cols ...string) bool
- func (df *DFcore) KeepColumns(colNames ...string) error
- func (df *DFcore) SourceDF() *DFcore
- type DFopt
- type DataTypes
- type Dialect
- func (d *Dialect) BufSize() int
- func (d *Dialect) Case(whens, vals []string) (string, error)
- func (d *Dialect) CastField(fieldName string, toDT DataTypes) (sqlStr string, err error)
- func (d *Dialect) CastFloat() bool
- func (d *Dialect) Close() error
- func (d *Dialect) Convert(val any) any
- func (d *Dialect) Create(tableName, orderBy string, fields []string, types []DataTypes, ...) error
- func (d *Dialect) DB() *sql.DB
- func (d *Dialect) DialectName() string
- func (d *Dialect) DropTable(tableName string) error
- func (d *Dialect) Exists(tableName string) bool
- func (d *Dialect) Functions() Fmap
- func (d *Dialect) Global(sourceSQL, colSQL string) string
- func (d *Dialect) Insert(tableName, makeQuery, fields string) error
- func (d *Dialect) InsertValues(tableName string, values []byte) error
- func (d *Dialect) Interp(sourceSQL, interpSQL, xSfield, xIfield, yField, outField string) string
- func (d *Dialect) IterSave(tableName string, df HasIter) error
- func (d *Dialect) Join(leftSQL, rightSQL string, leftFields, rightFields, joinFields []string) string
- func (d *Dialect) Load(qry string) (memData []*Vector, fieldNames []string, fieldTypes []DataTypes, e error)
- func (d *Dialect) Quantile(col string, q float64) string
- func (d *Dialect) Quote() string
- func (d *Dialect) RowCount(qry string) (int, error)
- func (d *Dialect) Rows(qry string) (rows *sql.Rows, row2Read []any, fieldNames []string, e error)
- func (d *Dialect) Save(tableName, orderBy string, overwrite, temp bool, toSave HasIter, ...) error
- func (d *Dialect) Seq(n int) string
- func (d *Dialect) ToName(fieldName string) string
- func (d *Dialect) ToString(val any) string
- func (d *Dialect) Types(qry string) (fieldNames []string, fieldTypes []DataTypes, row2read []any, err error)
- func (d *Dialect) Union(table1, table2 string, colNames ...string) (string, error)
- func (d *Dialect) WithName() string
- type DialectOpt
- type FileOpt
- func FileDateFormat(format string) FileOpt
- func FileDefaultDate(year, mon, day int) FileOpt
- func FileDefaultFloat(deflt float64) FileOpt
- func FileDefaultInt(deflt int) FileOpt
- func FileDefaultString(deflt string) FileOpt
- func FileEOL(eol byte) FileOpt
- func FileFieldNames(fieldNames []string) FileOpt
- func FileFieldTypes(fieldTypes []DataTypes) FileOpt
- func FileFieldWidths(fieldWidths []int) FileOpt
- func FileFloatFormat(format string) FileOpt
- func FileHeader(hasHeader bool) FileOpt
- func FilePeek(linesToPeek int) FileOpt
- func FileSep(sep byte) FileOpt
- func FileStrict(strict bool) FileOpt
- func FileStringDelim(delim byte) FileOpt
- type Files
- func (f *Files) Close() error
- func (f *Files) Create(fileName string) error
- func (f *Files) FieldNames() []string
- func (f *Files) FieldTypes() []DataTypes
- func (f *Files) FieldWidths() []int
- func (f *Files) Load() ([]*Vector, error)
- func (f *Files) Open(fileName string) error
- func (f *Files) Save(fileName string, df HasIter) error
- type Fmap
- type Fn
- type FnReturn
- type FnSpec
- type Fns
- type HasIter
- type HasMQdlct
- type Scalar
- func (s *Scalar) AllRows() iter.Seq2[int, []any]
- func (s *Scalar) AppendRows(col Column) (Column, error)
- func (s *Scalar) Copy() Column
- func (s *Scalar) Core() *ColCore
- func (s *Scalar) Data() *Vector
- func (s *Scalar) Len() int
- func (s *Scalar) Rename(newName string) error
- func (s *Scalar) Replace(ind, repl Column) (Column, error)
- func (s *Scalar) String() string
- type Vector
- func (v *Vector) AllRows() iter.Seq2[int, []any]
- func (v *Vector) Append(data ...any) error
- func (v *Vector) AppendVector(vAdd *Vector) error
- func (v *Vector) AsAny() any
- func (v *Vector) AsDate() ([]time.Time, error)
- func (v *Vector) AsFloat() ([]float64, error)
- func (v *Vector) AsInt() ([]int, error)
- func (v *Vector) AsString() ([]string, error)
- func (v *Vector) Copy() *Vector
- func (v *Vector) Element(indx int) any
- func (v *Vector) ElementDate(indx int) (*time.Time, error)
- func (v *Vector) ElementFloat(indx int) (*float64, error)
- func (v *Vector) ElementInt(indx int) (*int, error)
- func (v *Vector) ElementString(indx int) (*string, error)
- func (v *Vector) Len() int
- func (v *Vector) Less(i, j int) bool
- func (v *Vector) SetAny(val any, indx int)
- func (v *Vector) SetDate(val time.Time, indx int) error
- func (v *Vector) SetFloat(val float64, indx int) error
- func (v *Vector) SetInt(val, indx int) error
- func (v *Vector) SetString(val string, indx int) error
- func (v *Vector) String() string
- func (v *Vector) Swap(i, j int)
- func (v *Vector) VectorType() DataTypes
- func (v *Vector) Where(indic *Vector) *Vector
Constants ¶
This section is empty.
Variables ¶
var DateFormats = []string{"20060102", "1/2/2006", "01/02/2006", "Jan 2, 2006", "January 2, 2006",
"Jan 2 2006", "January 2 2006", "2006-01-02", "01/02/06", "1/2/06"}
DateFormats is list of available formats for dates.
Functions ¶
func Has ¶
func Has[C comparable](needle C, haystack []C) bool
func Parse ¶
Parse parses the expression expr and appends the result to df. Expressions have the form:
<result> := <expression>.
A list of functions available is in the documentation.
func Position ¶
func Position[C comparable](needle C, haystack []C) int
func PrettyPrint ¶
PrettyPrint returns a string where the elements of cols are aligned under the header. cols are expected to be a slice of either float64, int, string or time.Time
func RandomLetters ¶
RandomLetters generates a string of length "length" by randomly choosing from a-z
func StringSlice ¶
StringSlice converts inVal to a slice of strings, the first element is the header. inVal is expected to be a slice of float64, int, string or time.Time
Types ¶
type CC ¶
type CC interface {
Core() *ColCore // Core returns itself.
CategoryMap() CategoryMap // CategoryMap returns a map of original value to category value. Not nil only for dt=DTcategorical.
DataType() DataTypes // DataType returns the type of the column.
Dependencies() []string // Dependencies returns a list of columns required to calculate this column, if this is a calculated column.
Dialect() *Dialect // Dialect returns the Dialect object. A Dialect object is required if there is DB interaction.
Name() string // Name returns the column's name.
Parent() DF // Parent returns the DF to which the column belongs.
Rename(newName string) error // Rename renames the column.
}
The CC interface defines the methods of ColCore. These methods are invariant to the data that underlies the column.
type CategoryMap ¶
CategoryMap maps the raw value of a categorical column to the category level
func (CategoryMap) Max ¶
func (cm CategoryMap) Max() int
func (CategoryMap) Min ¶
func (cm CategoryMap) Min() int
func (CategoryMap) String ¶
func (cm CategoryMap) String() string
type ColCore ¶
type ColCore struct {
// contains filtered or unexported fields
}
ColCore implements the CC interface.
func NewColCore ¶
func (*ColCore) CategoryMap ¶
func (c *ColCore) CategoryMap() CategoryMap
func (*ColCore) Core ¶
Core returns itself. We eed a method to return itself since DFCore struct will need these methods
func (*ColCore) Dependencies ¶
type ColOpt ¶
ColOpt functions are used to set ColCore options
func ColCatMap ¶
func ColCatMap(cm CategoryMap) ColOpt
func ColDataType ¶
func ColDialect ¶
func ColRawType ¶
type Column ¶
type Column interface {
// Core Methods
CC
// AllRows iterates through the rows of the column. It returns the row # and the value of the column at that row.
// The row value return is a slice, []any, of length 1. This was done to be consistent with
// the AllRows() function of DF which also returns []any.
AllRows() iter.Seq2[int, []any]
// Copy returns a copy of the column.
Copy() Column
// Data returns the contents of the column. Column implementations that are not stored in memory (e.g. as in a database)
// will have to fetch the data when this method is called.
Data() *Vector
// Len is the length of the column.
Len() int
// Stringer. This is expected to be a summary of the column.
String() string
}
The Column interface defines the methods that columns must have.
type DC ¶
type DC interface {
// AllColumns returns an iterator across the columns.
AllColumns() iter.Seq[Column]
// AppendColumns appends col to the DF.
AppendColumn(col Column, replace bool) error
// Column returns the column colName. Returns nil if the column doesn't exist.
Column(colName string) Column
// ColNames returns the names of all the columns.
ColumnNames() []string
// ColumnTypes returns the types of columns. If cols is nil, returns the types for all columns.
ColumnTypes(cols ...string) ([]DataTypes, error)
// Core returns itself.
Core() *DFcore
// Dialect returns the Dialect object for DB access.
Dialect() *Dialect
// DropColumns drops colNames from the DF.
DropColumns(colNames ...string) error
// Fns returns a slice of functions that operate on columns.
Fns() Fns
// HasColumns returns true if the DF has all cols.
HasColumns(cols ...string) bool
// KeepColumns subsets DF to colsToKeep
KeepColumns(colsToKeep ...string) error
// sourceDF returns the source DF for this DF if this DF is a derivative (e.g. a Table).
SourceDF() *DFcore
}
type DF ¶
type DF interface {
// Core methods
DC
// AllRows iterates through the rows of the column. It returns the row # and the values of DF that row.
AllRows() iter.Seq2[int, []any]
// AppendDF appends df
AppendDF(df DF) (DF, error)
// By creates a new DF that groups the source DF by the columns listed in groupBy and calculates fns on the groups.
By(groupBy string, fns ...string) (DF, error)
// Categorical creates a categorical column
// colName - name of the source column
// catMap - optionally supply a category map of source value -> category level
// fuzz - if a source column value has counts < fuzz, then it is put in the 'other' category.
// defaultVal - optional source column value for the 'other' category.
// levels - slice of source values to make categories from
Categorical(colName string, catMap CategoryMap, fuzz int, defaultVal any, levels []any) (Column, error)
Copy() DF
// Interp interpolates the columns (xIfield,yfield) at xsField points.
// iDF - input iterator (e.g. Column or DF) that yields the points to interpolate at
// xSfield - column name of x values in source DF
// xIfield - name of x values in iDF
// yfield - column name of y values in source DF
// outField - column name of interpolated y's in return DF
//
// The output DF has two columns: xIfield, outField.
Interp(iDF HasIter, xSfield, xIfield, yfield, outField string) (DF, error)
// Join inner joins the df to the source DF on the joinOn fields
// df - DF to join
// joinOn - comma-separated list of fields to join on.
Join(df HasIter, joinOn string) (DF, error)
// RowCount returns # of rows in df
RowCount() int
// SetParent sets the Parent field of all the columns in the source DF
SetParent() error
// Sort sorts the source DF on sortCols
// ascending - if true, sorts ascending
// sortCols - sortCols is a comma-separated list of fields on which to sort.
Sort(ascending bool, sortCols string) error
// String is expected to produce a summary of the source DF.
String() string
// Table returns a table based on cols.
// cols - comma-separated list of column names for the table.
// The return is expected to include the columns "count" and "rate"
Table(cols string) (DF, error)
// Where returns a DF subset according to condition.
Where(condition string) (DF, error)
}
type DFcore ¶
type DFcore struct {
// contains filtered or unexported fields
}
DFcore implements DC.
func (*DFcore) ColumnNames ¶
func (*DFcore) DropColumns ¶
func (*DFcore) HasColumns ¶
func (*DFcore) KeepColumns ¶
type DataTypes ¶
type DataTypes uint8
DataTypes are the types of data that the package supports for Column elements
const ( DTfloat DataTypes = 0 + iota DTint DTstring DTdate DTcategorical DTunknown // keep as last entry, OK to put new entries before )
Values of DataTypes
func DTFromString ¶
DTFromString returns the DataTypes value as given by nm e.g. Input "DTdate", output 3. Fail behavior is to return DTunknown
type Dialect ¶
type Dialect struct {
// contains filtered or unexported fields
}
Dialect manages interactions with DB's.
func NewDialect ¶
NewDialect creates a *Dialect to manage DB access.
func (*Dialect) Case ¶
Case creates a CASE statement.
whens - slice of conditions vals - slice of the value to set the result to if condition is true
func (*Dialect) CastFloat ¶
CastFloat says whether floats need to be cast as such. Postgress will return "NUMERIC" for calculated fields which the connector loads as strings
func (*Dialect) Convert ¶
Convert converts val to the corresponding datatype used by df. assign assigns the indx vector of v to be val
func (*Dialect) Create ¶
func (d *Dialect) Create(tableName, orderBy string, fields []string, types []DataTypes, overwrite, temporary bool, options ...string) error
Create creates a table.
tableName - name of the table to create orderBy - comma-separated list of fields to form the key (order) fields - field names types - field types overwrite - if true, overwrite existing table temporary - create a temp table options - are in key:value format and are meant to replace placeholders in create.txt
func (*Dialect) DialectName ¶
func (*Dialect) Global ¶
Global takes SQL that normally is a scalar return (e.g. count(*), avg(x)) and surrounds it with SQL to return that value for every row of a query.
func (*Dialect) InsertValues ¶
InsertValues inserts values into tableName
func (*Dialect) Join ¶
func (d *Dialect) Join(leftSQL, rightSQL string, leftFields, rightFields, joinFields []string) string
Join creates an inner JOIN query.
leftSQL - SQL for left side of join rightSQL - SQL for right side of join leftFields - fields to keep from leftSQL rightFields - fields to keep from rightSQL joinField - fields to join on
func (*Dialect) Load ¶
func (d *Dialect) Load(qry string) (memData []*Vector, fieldNames []string, fieldTypes []DataTypes, e error)
Load loads qry from a DB into a slice of *Vector.
memData - returned data fieldNames - field names of columns fieldTypes - field types
func (*Dialect) Rows ¶
Rows returns a row reader for qry.
rows - row reader row2Read - a slice with the appropriate types to read the rows. fieldNames - names of the columns
func (*Dialect) Save ¶
func (d *Dialect) Save(tableName, orderBy string, overwrite, temp bool, toSave HasIter, options ...string) error
Save saves an Iter object to a database.
tableName - name of table to create. orderBy - comma-separated list of fields to use as key (order). overwrite - if true, replace any existing table. temp - if true, create a temp table. toSave - data to save. options - options for CREATE.
func (*Dialect) Seq ¶
Seq returns a query that creates a table with column "seq" whose int values run from 0 to n-1.
func (*Dialect) ToName ¶
ToName converts the raw field name to what's need for a interaction with the database. Specifically, Postgres requires quotes around field names that have uppercase letters
func (*Dialect) Types ¶
func (d *Dialect) Types(qry string) (fieldNames []string, fieldTypes []DataTypes, row2read []any, err error)
Types returns info needed to read the data generated by qry.
fieldNames - names of columns qry returns. fieldTypes - column types returned by qry. row2Read - correctly typed row to read for Scan.
type DialectOpt ¶
DialectOpt functions are used to set Dialect options
func DialectBuffSize ¶
func DialectBuffSize(bufMB int) DialectOpt
DialectBuffSize sets the buffer size (in MB) for accumulating inserts. Default is 1GB.
func DialectDefaultDate ¶
func DialectDefaultDate(year, mon, day int) DialectOpt
DialectDefaultDate sets the default date to use if a date is null. Default is 1/1/1960.
func DialectDefaultFloat ¶
func DialectDefaultFloat(deflt float64) DialectOpt
DialectDefaultFloat sets the default float to use if an int is null. Default is MaxFloat64.
func DialectDefaultInt ¶
func DialectDefaultInt(deflt int) DialectOpt
DialectDefaultInt sets the default int to use if an int is null. Default is MaxInt.
func DialectDefaultString ¶
func DialectDefaultString(deflt string) DialectOpt
DialectDefaultString sets the default string to use if an int is null. Default is "".
type FileOpt ¶
FileOpt functions are used to set Files options
func FileDateFormat ¶
FileDateFormat sets the format for dates in the file. Default is 20060102.
func FileDefaultDate ¶
FileDefaultDate sets the value to use for fields that fail to convert to date if strict=false. Default is 1/1/1960.
func FileDefaultFloat ¶
FileDefaultFloat sets the value to use for fields that fail to convert to float if strict=false. Default is MaxFloat64.
func FileDefaultInt ¶
FileDefaultInt sets the value to use for fields that fail to convert to integer if strict=false. Default is MaxInt.
func FileDefaultString ¶
FileDefaultString sets the value to use for fields that fail to convert to string if strict=false. Default is "".
func FileFieldNames ¶
FileFieldNames sets the field names for the file -- needed if the file has no header.
func FileFieldTypes ¶
FileFieldTypes sets the field types for the file--can be used instead of peeking at the file & guessing.
func FileFieldWidths ¶
FileFieldWidths sets field widths for flat files
func FileFloatFormat ¶
FileFloatFormat sets the format for writing floats. Default is %.2f.
func FileHeader ¶
FileHeader sets true if file has a header. Default is true.
func FilePeek ¶
FilePeek sets the # of lines to examine to determine data types. Default value of 0 will examine the entire file.
func FileStrict ¶
FileStrict sets the action when a field fails to convert to its expected type.
If true, then an error results. If false, the default value is substituted.
Default: false
func FileStringDelim ¶
FilesStringDelim sets the string delimiter. The default is ".
type Files ¶
type Files struct {
// contains filtered or unexported fields
}
Files manages interactions with files.
func (*Files) FieldNames ¶
func (*Files) FieldTypes ¶
func (*Files) FieldWidths ¶
type Fmap ¶
Fmap maps the function name to its spec
func LoadFunctions ¶
LoadFunctions loads functions from a string which is an embedded file. LoadFunctions expects functions to be separated by "\n" Within each line there are 6 fields separated by colons. The fields are:
function name function spec inputs outputs return type (C = column, S = scalar) varying inputs (Y = yes).
Inputs are sets of types with in braces separated by commas.
{int,int},{float,float}
specifies the function takes two parameters which can be either {int,int} or {float,float}.
Corresponding to each set of inputs is an output type. In the above example, if the function always returns a float, the output would be:
float,float.
Legal types are float, int, string and date. Categorical inputs are ints.
If there is no input parameter, leave the field empty as in:
::
type Fn ¶
Fn is the function signature for functions called by the parser.
info - if info == true, then the function is not run but returns *FnReturn with info fields filled in (Name, Output, Inputs, Varying, IsScalar) df - DF providing data for function (required only if info=false). inputs - inputs to the function (required only if info=false).
type FnReturn ¶
type FnReturn struct {
Value Column // return value of function
Name string // name of function
// An element of Inputs is a slice of data types that the function takes as inputs. For instance,
// {DTfloat,DTint}
// means that the function takes 2 inputs - the first float, the second int. And
// {DTfloat,DTint},{DTfloat,DTfloat}
// means that the function takes 2 inputs - either float,int or float,float.
Inputs [][]DataTypes
// Output types corresponding to the input slices.
Output []DataTypes
Varying bool // if true, the number of inputs varies.
IsScalar bool // if true, the function reduces a column to a scalar (e.g. sum, mean)
Err error
}
FnReturn is the return type for parser functions
type FnSpec ¶
type FnSpec struct {
// Name is the name of the function that the parser will recognize in user statements.
Name string
// FnDetail gives the specifics of the function.
// For df/sql, this is the SQL that is run.
// For df/mem, this is the name of the Go function to call.
FnDetail string
// Inputs is a slice that lists all valid combinations of inputs.
Inputs [][]DataTypes
// Outputs is a slice that lists the outputs corresponding to each element of Inputs.
Outputs []DataTypes
// IsScalar is true if the function reduces a column to a scalar (e.g. mean, sum)
IsScalar bool
// Varying is true if the number of inputs can vary.
Varying bool
// This is a slice of Go functions to call, corresponding to the elements of inputs/outputs.
// Not used for df/sql.
Fns []any
}
FnSpec specifies a function that the parser will have access to.
type HasIter ¶
The HasIter interface restricts to types that have an iterator through the rows of the data. Save only requires an iterator to move through the rows
type Scalar ¶
type Scalar struct {
*ColCore
// contains filtered or unexported fields
}
Scalar implements Column for scalars.
type Vector ¶
type Vector struct {
// contains filtered or unexported fields
}
Vector is the return type for Column data.
func MakeVector ¶
MakeVector returns a *Vector with data of type dt and length n.
func NewVector ¶
NewVector creates a new *Vector from data, checking/converting that it is of type dt.
func (*Vector) AllRows ¶
AllRows returns an iterator that move through the data. It returns a slice rather than a row so that it's compatible with the DF iterator
func (*Vector) AppendVector ¶
AppendVector appends a vector.
func (*Vector) AsDate ¶
AsDate returns the data as a time.Time slice. It converts to date, if needed & possible.
func (*Vector) AsFloat ¶
AsFloat returns the data as a []float64 slice. It converts to float64, if needed & possible.
func (*Vector) AsInt ¶
AsInt returns the data as a []int slice. It converts to int, if needed & possible.
func (*Vector) Element ¶
Element returns the indx'th element of Vector. It returns nil if indx is out of bounds if v.Len() > 1. If v.Len() = 1, then returns the 0th element. This is needed for the parser when we have an op like "x/2" and we don't want to append a vector of 2's.
func (*Vector) ElementDate ¶
ElementDate returns the indx'th element as a date, converting the value, if needed & possible.
func (*Vector) ElementFloat ¶
ElementFloat returns the indx'th element as a float64, converting the value, if needed & possible.
func (*Vector) ElementInt ¶
ElementInt returns the indx'th element as a int, converting the value, if needed & possible.
func (*Vector) ElementString ¶
ElementString returns the indx'th element as a string.