df

package module

v0.0.3 Latest Latest Go to latest Published: May 6, 2025 License: Apache-2.0 Imports: 15 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/invertedv/df

Links

Open Source Insights

README ¶

df - A Dataframe Package

Overview

What Makes df Different?

Dataframes are commonly-used objects that are used to hold and manipulate data for analysis. Conceptually, a dataframe consists of a set of columns. Columns, in turn, are arrays of values which are of a common type and length. Originally, implementations of dataframes such as in R and the pandas Python package were designed to hold the data in memory, though these have been extended to big-data cases.

How is df different? The package df specifies interfaces for dataframes and Columns. The package is agnostic as to the mechanisms handling the underlying data.

With this approach, the user can pull a sample of a large table, experiment with the data, do EDA, etc., in a fast, efficient manner. When desired, the same Go code can be run over the entire table.

The df package consists of a main package, df, and two sub-packages, df/mem and df/sql. The main package:

defines the DF and Column interfaces.
implements core aspects of those interfaces.
provides a parser to evaluate expressions.
handles file and DB I/O.

Packages df/mem and df/sql implement the full DF and Column interfaces for in-memory data and SQL databases, respectively. The distinction between df/mem and df/sql is not the source of the data. Package mem/DF dataframes can be read from and save to a database, for example. The distinction is where the calculations and manipulations are performed. The df/mem package does this work in memory, while the df/sql performs it in the database.

Functionality

What do you need to be able to do with a dataframe? Well, you'll want to

Create and save them.
Manipulate the columns such as creating new columns based on the existing ones.
Subset, sort, summarize and join the data.

To this end,

With df you can read/write files (such as CSV) and SQL tables.
df has a parser for evaluating expressions to create new columns. The parser allows flexible specification of expressions that return a column result. The parser will work on any type that satisfies the DF interface*.
```
Parse(df, "y := exp(yhat) / (1.0 + exp(yhat))")
Parse(df, "r := if(k==1,a,b)")
Parse(df, "xNorm := x / global(sum(x))")
Parse(df, "zCat := cat(z)")
```
The interface specification includes methods such as Sort(), By() and Join().

*Note: the specific implementation must also provide to the parser implementations of functions, such as "sum". The df/mem and df/sql packages offer identical function sets. See the Parse section of a list of supported functions.

Extensible

The package may be extended in several directions. One can

Add new functions to the parser.
Add additional database types can be added to the sql package. Currently, ClickHouse and Postgres are supported. Adding support for the new DB type requires modifying the Dialect struct. The sql package would not need to be modified.
Build a completely new implementation of the DF and Column interfaces.

Package Details

df

The df package defines the DF and Column interfaces in two steps: a core (DC, CC, respectively) and full interface (DF, Column). The core interface defines those methods which are independent of the details of the data architecture (e.g. DropColumns() from DF, Name() method for a Column). The df package provides structs that implement the core DF and Column interfaces (DFcore, ColCore).

The package also provides a parser, procedures for accessing files and tables, and additional structures required by the package.

df/mem

The df/mem package implements the DF and Column interfaces for in-memory objects.

df/sql

The package df/sql implements the DF and Column interfaces for SQL databases. It relies on the methods of Dialect to handle the specifics of any particular database type.

A basic design philosophy of this package is that the storage mechanism of the data doesn't matter. A complication is that, though two database packages may use SQL, the details are likely to differ. The Dialect struct and its methods abstract these differences away. Those methods handle the differences between databases, hence each DB must specifically be handled there. Currently, Clickhouse and Postgres are supported. Dialect uses the standard Go sql package connector. All communication with databases occurs through Dialect.

Data Types Four data types are supported for column elements:

float
int
string
date

There is one additional type, "categorical", which is a mapping of the values of a source column (of type int, string or date) into int.

Note that the df/mem and df/sql packages strongly type data. One cannot add a float and an int, for example.

Docs For details, see the docs.

Documentation ¶

Overview ¶

The package df is an implementation of dataframes. The central idea here is that the dataframes are defined as an interface which is independent of the implementation of the data-handling details.

The package df defines:

The dataframe and column interfaces (DF, Column).
Implements core aspects of these.
Provides a parser to handle Column-valued expressions.
Provides for file and database IO.

Along with df, there are two sub-packages implementing DF and Column:

df/mem. In-memory dataframes,
df/sql. SQL-database dataframes. The current implementation covers ClickHouse and Postgres databases.

See the [documentation]: https://invertedv.github.io/df for details.

Index ¶

Variables
func Has[C comparable](needle C, haystack []C) bool
func Parse(df DF, expr string) error
func Position[C comparable](needle C, haystack []C) int
func PrettyPrint(header []string, cols ...any) string
func RandomLetters(length int) string
func StringSlice(header string, inVal any) []string
func ToDataType(x any, dt DataTypes) (any, bool)
type CC
type CategoryMap
- func (cm CategoryMap) Max() int
- func (cm CategoryMap) Min() int
- func (cm CategoryMap) String() string
type ColCore
- func NewColCore(opts ...ColOpt) (*ColCore, error)
- func (c *ColCore) CategoryMap() CategoryMap
- func (c *ColCore) Copy() *ColCore
- func (c *ColCore) Core() *ColCore
- func (c *ColCore) DataType() DataTypes
- func (c *ColCore) Dependencies() []string
- func (c *ColCore) Dialect() *Dialect
- func (c *ColCore) Name() string
- func (c *ColCore) Parent() DF
- func (c *ColCore) RawType() DataTypes
- func (c *ColCore) Rename(newName string) error
type ColOpt
- func ColCatMap(cm CategoryMap) ColOpt
- func ColDataType(dt DataTypes) ColOpt
- func ColDialect(dlct *Dialect) ColOpt
- func ColName(name string) ColOpt
- func ColParent(df DF) ColOpt
- func ColRawType(raw DataTypes) ColOpt
type Column
- func RunDFfn(fn Fn, df DF, inputs []Column) (Column, error)
type DC
type DF
type DFcore
- func NewDFcore(cols []Column, opts ...DFopt) (df *DFcore, err error)
- func (df *DFcore) AllColumns() iter.Seq[Column]
- func (df *DFcore) AppendColumn(col Column, replace bool) error
- func (df *DFcore) Column(colName string) Column
- func (df *DFcore) ColumnNames() []string
- func (df *DFcore) ColumnTypes(colNames ...string) ([]DataTypes, error)
- func (df *DFcore) Copy() *DFcore
- func (df *DFcore) Core() *DFcore
- func (df *DFcore) Dialect() *Dialect
- func (df *DFcore) DropColumns(colNames ...string) error
- func (df *DFcore) Fns() Fns
- func (df *DFcore) HasColumns(cols ...string) bool
- func (df *DFcore) KeepColumns(colNames ...string) error
- func (df *DFcore) SourceDF() *DFcore
type DFopt
- func DFappendFn(f Fn) DFopt
- func DFdialect(d *Dialect) DFopt
- func DFsetFns(f Fns) DFopt
- func DFsetSourceDF(source DC) DFopt
type DataTypes
- func DTFromString(nm string) DataTypes
- func GetKind(fn reflect.Type) DataTypes
- func WhatAmI(val any) DataTypes
- func (i DataTypes) String() string
type Dialect
- func NewDialect(dialect string, db *sql.DB, opts ...DialectOpt) (*Dialect, error)
- func (d *Dialect) BufSize() int
- func (d *Dialect) Case(whens, vals []string) (string, error)
- func (d *Dialect) CastField(fieldName string, toDT DataTypes) (sqlStr string, err error)
- func (d *Dialect) CastFloat() bool
- func (d *Dialect) Close() error
- func (d *Dialect) Convert(val any) any
- func (d *Dialect) Create(tableName, orderBy string, fields []string, types []DataTypes, ...) error
- func (d *Dialect) DB() *sql.DB
- func (d *Dialect) DialectName() string
- func (d *Dialect) DropTable(tableName string) error
- func (d *Dialect) Exists(tableName string) bool
- func (d *Dialect) Functions() Fmap
- func (d *Dialect) Global(sourceSQL, colSQL string) string
- func (d *Dialect) Insert(tableName, makeQuery, fields string) error
- func (d *Dialect) InsertValues(tableName string, values []byte) error
- func (d *Dialect) Interp(sourceSQL, interpSQL, xSfield, xIfield, yField, outField string) string
- func (d *Dialect) IterSave(tableName string, df HasIter) error
- func (d *Dialect) Join(leftSQL, rightSQL string, leftFields, rightFields, joinFields []string) string
- func (d *Dialect) Load(qry string) (memData []*Vector, fieldNames []string, fieldTypes []DataTypes, e error)
- func (d *Dialect) Quantile(col string, q float64) string
- func (d *Dialect) Quote() string
- func (d *Dialect) RowCount(qry string) (int, error)
- func (d *Dialect) Rows(qry string) (rows *sql.Rows, row2Read []any, fieldNames []string, e error)
- func (d *Dialect) Save(tableName, orderBy string, overwrite, temp bool, toSave HasIter, ...) error
- func (d *Dialect) Seq(n int) string
- func (d *Dialect) ToName(fieldName string) string
- func (d *Dialect) ToString(val any) string
- func (d *Dialect) Types(qry string) (fieldNames []string, fieldTypes []DataTypes, row2read []any, err error)
- func (d *Dialect) Union(table1, table2 string, colNames ...string) (string, error)
- func (d *Dialect) WithName() string
type DialectOpt
- func DialectBuffSize(bufMB int) DialectOpt
- func DialectDefaultDate(year, mon, day int) DialectOpt
- func DialectDefaultFloat(deflt float64) DialectOpt
- func DialectDefaultInt(deflt int) DialectOpt
- func DialectDefaultString(deflt string) DialectOpt
type FileOpt
- func FileDateFormat(format string) FileOpt
- func FileDefaultDate(year, mon, day int) FileOpt
- func FileDefaultFloat(deflt float64) FileOpt
- func FileDefaultInt(deflt int) FileOpt
- func FileDefaultString(deflt string) FileOpt
- func FileEOL(eol byte) FileOpt
- func FileFieldNames(fieldNames []string) FileOpt
- func FileFieldTypes(fieldTypes []DataTypes) FileOpt
- func FileFieldWidths(fieldWidths []int) FileOpt
- func FileFloatFormat(format string) FileOpt
- func FileHeader(hasHeader bool) FileOpt
- func FilePeek(linesToPeek int) FileOpt
- func FileSep(sep byte) FileOpt
- func FileStrict(strict bool) FileOpt
- func FileStringDelim(delim byte) FileOpt
type Files
- func NewFiles(opts ...FileOpt) (*Files, error)
- func (f *Files) Close() error
- func (f *Files) Create(fileName string) error
- func (f *Files) FieldNames() []string
- func (f *Files) FieldTypes() []DataTypes
- func (f *Files) FieldWidths() []int
- func (f *Files) Load() ([]*Vector, error)
- func (f *Files) Open(fileName string) error
- func (f *Files) Save(fileName string, df HasIter) error
type Fmap
- func LoadFunctions(fns string) Fmap
type Fn
type FnReturn
type FnSpec
type Fns
- func (fs Fns) Get(fnName string) Fn
type HasIter
type HasMQdlct
type Scalar
- func NewScalar(val any, opts ...ColOpt) (*Scalar, error)
- func (s *Scalar) AllRows() iter.Seq2[int, []any]
- func (s *Scalar) AppendRows(col Column) (Column, error)
- func (s *Scalar) Copy() Column
- func (s *Scalar) Core() *ColCore
- func (s *Scalar) Data() *Vector
- func (s *Scalar) Len() int
- func (s *Scalar) Rename(newName string) error
- func (s *Scalar) Replace(ind, repl Column) (Column, error)
- func (s *Scalar) String() string
type Vector
- func MakeVector(dt DataTypes, n int) *Vector
- func NewVector(data any, dt DataTypes) (*Vector, error)
- func (v *Vector) AllRows() iter.Seq2[int, []any]
- func (v *Vector) Append(data ...any) error
- func (v *Vector) AppendVector(vAdd *Vector) error
- func (v *Vector) AsAny() any
- func (v *Vector) AsDate() ([]time.Time, error)
- func (v *Vector) AsFloat() ([]float64, error)
- func (v *Vector) AsInt() ([]int, error)
- func (v *Vector) AsString() ([]string, error)
- func (v *Vector) Copy() *Vector
- func (v *Vector) Element(indx int) any
- func (v *Vector) ElementDate(indx int) (*time.Time, error)
- func (v *Vector) ElementFloat(indx int) (*float64, error)
- func (v *Vector) ElementInt(indx int) (*int, error)
- func (v *Vector) ElementString(indx int) (*string, error)
- func (v *Vector) Len() int
- func (v *Vector) Less(i, j int) bool
- func (v *Vector) SetAny(val any, indx int)
- func (v *Vector) SetDate(val time.Time, indx int) error
- func (v *Vector) SetFloat(val float64, indx int) error
- func (v *Vector) SetInt(val, indx int) error
- func (v *Vector) SetString(val string, indx int) error
- func (v *Vector) String() string
- func (v *Vector) Swap(i, j int)
- func (v *Vector) VectorType() DataTypes
- func (v *Vector) Where(indic *Vector) *Vector

Constants ¶

This section is empty.

Variables ¶

View Source

var DateFormats = []string{"20060102", "1/2/2006", "01/02/2006", "Jan 2, 2006", "January 2, 2006",
	"Jan 2 2006", "January 2 2006", "2006-01-02", "01/02/06", "1/2/06"}

DateFormats is list of available formats for dates.

Functions ¶

func Has ¶

func Has[C comparable](needle C, haystack []C) bool

func Parse ¶

func Parse(df DF, expr string) error

Parse parses the expression expr and appends the result to df. Expressions have the form:

<result> := <expression>.

A list of functions available is in the documentation.

func Position ¶

func Position[C comparable](needle C, haystack []C) int

func PrettyPrint ¶

func PrettyPrint(header []string, cols ...any) string

PrettyPrint returns a string where the elements of cols are aligned under the header. cols are expected to be a slice of either float64, int, string or time.Time

func RandomLetters ¶

func RandomLetters(length int) string

RandomLetters generates a string of length "length" by randomly choosing from a-z

func StringSlice ¶

func StringSlice(header string, inVal any) []string

StringSlice converts inVal to a slice of strings, the first element is the header. inVal is expected to be a slice of float64, int, string or time.Time

func ToDataType ¶

func ToDataType(x any, dt DataTypes) (any, bool)

ToDataType converts x to the dt data type.

Types ¶

type CC ¶

type CC interface {
	Core() *ColCore              // Core returns itself.
	CategoryMap() CategoryMap    // CategoryMap returns a map of original value to category value.  Not nil only for dt=DTcategorical.
	DataType() DataTypes         // DataType returns the type of the column.
	Dependencies() []string      // Dependencies returns a list of columns required to calculate this column, if this is a calculated column.
	Dialect() *Dialect           // Dialect returns the Dialect object. A Dialect object is required if there is DB interaction.
	Name() string                // Name returns the column's name.
	Parent() DF                  // Parent returns the DF to which the column belongs.
	Rename(newName string) error // Rename renames the column.
}

The CC interface defines the methods of ColCore. These methods are invariant to the data that underlies the column.

type CategoryMap ¶

type CategoryMap map[any]int

CategoryMap maps the raw value of a categorical column to the category level

func (CategoryMap) Max ¶

func (cm CategoryMap) Max() int

func (CategoryMap) Min ¶

func (cm CategoryMap) Min() int

func (CategoryMap) String ¶

func (cm CategoryMap) String() string

type ColCore ¶

type ColCore struct {
	// contains filtered or unexported fields
}

ColCore implements the CC interface.

func NewColCore ¶

func NewColCore(opts ...ColOpt) (*ColCore, error)

func (*ColCore) CategoryMap ¶

func (c *ColCore) CategoryMap() CategoryMap

func (*ColCore) Copy ¶

func (c *ColCore) Copy() *ColCore

func (*ColCore) Core ¶

func (c *ColCore) Core() *ColCore

Core returns itself. We eed a method to return itself since DFCore struct will need these methods

func (*ColCore) DataType ¶

func (c *ColCore) DataType() DataTypes

func (*ColCore) Dependencies ¶

func (c *ColCore) Dependencies() []string

func (*ColCore) Dialect ¶

func (c *ColCore) Dialect() *Dialect

func (*ColCore) Name ¶

func (c *ColCore) Name() string

func (*ColCore) Parent ¶

func (c *ColCore) Parent() DF

func (*ColCore) RawType ¶

func (c *ColCore) RawType() DataTypes

func (*ColCore) Rename ¶

func (c *ColCore) Rename(newName string) error

type ColOpt ¶

type ColOpt func(c CC) error

ColOpt functions are used to set ColCore options

func ColCatMap ¶

func ColCatMap(cm CategoryMap) ColOpt

func ColDataType ¶

func ColDataType(dt DataTypes) ColOpt

func ColDialect ¶

func ColDialect(dlct *Dialect) ColOpt

func ColName ¶

func ColName(name string) ColOpt

func ColParent ¶

func ColParent(df DF) ColOpt

func ColRawType ¶

func ColRawType(raw DataTypes) ColOpt

type Column ¶

type Column interface {

	// Core Methods
	CC

	// AllRows iterates through the rows of the column.  It returns the row # and the value of the column at that row.
	// The row value return is a slice, []any, of length 1.  This was done to be consistent with
	// the AllRows() function of DF which also returns []any.
	AllRows() iter.Seq2[int, []any]

	// Copy returns a copy of the column.
	Copy() Column

	// Data returns the contents of the column.  Column implementations that are not stored in memory (e.g. as in a database)
	//  will have to fetch the data when this method is called.
	Data() *Vector

	// Len is the length of the column.
	Len() int

	// Stringer.  This is expected to be a summary of the column.
	String() string
}

The Column interface defines the methods that columns must have.

func RunDFfn ¶

func RunDFfn(fn Fn, df DF, inputs []Column) (Column, error)

RunDFfn runs a parser function

fn     - function to run
df     - data frame providing data
inputs - inputs to fn. If the inputs belong to a DF.

type DC ¶

type DC interface {
	// AllColumns returns an iterator across the columns.
	AllColumns() iter.Seq[Column]

	// AppendColumns appends col to the DF.
	AppendColumn(col Column, replace bool) error

	// Column returns the column colName.  Returns nil if the column doesn't exist.
	Column(colName string) Column

	// ColNames returns the names of all the columns.
	ColumnNames() []string

	// ColumnTypes returns the types of columns.  If cols is nil, returns the types for all columns.
	ColumnTypes(cols ...string) ([]DataTypes, error)

	// Core returns itself.
	Core() *DFcore

	// Dialect returns the Dialect object for DB access.
	Dialect() *Dialect

	// DropColumns drops colNames from the DF.
	DropColumns(colNames ...string) error

	// Fns returns a slice of functions that operate on columns.
	Fns() Fns

	// HasColumns returns true if the DF has all cols.
	HasColumns(cols ...string) bool

	// KeepColumns subsets DF to colsToKeep
	KeepColumns(colsToKeep ...string) error

	// sourceDF returns the source DF for this DF if this DF is a derivative (e.g. a Table).
	SourceDF() *DFcore
}

type DF ¶

type DF interface {
	// Core methods
	DC

	// AllRows iterates through the rows of the column.  It returns the row # and the values of DF that row.
	AllRows() iter.Seq2[int, []any]

	// AppendDF appends df
	AppendDF(df DF) (DF, error)

	// By creates a new DF that groups the source DF by the columns listed in groupBy and calculates fns on the groups.
	By(groupBy string, fns ...string) (DF, error)

	// Categorical creates a categorical column
	//	colName    - name of the source column
	//	catMap     - optionally supply a category map of source value -> category level
	//	fuzz       - if a source column value has counts < fuzz, then it is put in the 'other' category.
	//	defaultVal - optional source column value for the 'other' category.
	//	levels     - slice of source values to make categories from
	Categorical(colName string, catMap CategoryMap, fuzz int, defaultVal any, levels []any) (Column, error)

	Copy() DF

	// Interp interpolates the columns (xIfield,yfield) at xsField points.
	//   iDF      - input iterator (e.g. Column or DF) that yields the points to interpolate at
	//   xSfield  - column name of x values in source DF
	//   xIfield  - name of x values in iDF
	//   yfield   - column name of y values in source DF
	//   outField - column name of interpolated y's in return DF
	//
	// The output DF has two columns: xIfield, outField.
	Interp(iDF HasIter, xSfield, xIfield, yfield, outField string) (DF, error)

	// Join inner joins the df to the source DF on the joinOn fields
	//   df       - DF to join
	//   joinOn   - comma-separated list of fields to join on.
	Join(df HasIter, joinOn string) (DF, error)

	// RowCount returns # of rows in df
	RowCount() int

	// SetParent sets the Parent field of all the columns in the source DF
	SetParent() error

	// Sort sorts the source DF on sortCols
	//   ascending - if true, sorts ascending
	//   sortCols      - sortCols is a comma-separated list of fields on which to sort.
	Sort(ascending bool, sortCols string) error

	// String is expected to produce a summary of the source DF.
	String() string

	// Table returns a table based on cols.
	//   cols - comma-separated list of column names for the table.
	// The return is expected to include the columns "count" and "rate"
	Table(cols string) (DF, error)

	// Where returns a DF subset according to condition.
	Where(condition string) (DF, error)
}

type DFcore ¶

type DFcore struct {
	// contains filtered or unexported fields
}

DFcore implements DC.

func NewDFcore ¶

func NewDFcore(cols []Column, opts ...DFopt) (df *DFcore, err error)

func (*DFcore) AllColumns ¶

func (df *DFcore) AllColumns() iter.Seq[Column]

func (*DFcore) AppendColumn ¶

func (df *DFcore) AppendColumn(col Column, replace bool) error

func (*DFcore) Column ¶

func (df *DFcore) Column(colName string) Column

func (*DFcore) ColumnNames ¶

func (df *DFcore) ColumnNames() []string

func (*DFcore) ColumnTypes ¶

func (df *DFcore) ColumnTypes(colNames ...string) ([]DataTypes, error)

func (*DFcore) Copy ¶

func (df *DFcore) Copy() *DFcore

func (*DFcore) Core ¶

func (df *DFcore) Core() *DFcore

func (*DFcore) Dialect ¶

func (df *DFcore) Dialect() *Dialect

func (*DFcore) DropColumns ¶

func (df *DFcore) DropColumns(colNames ...string) error

func (*DFcore) Fns ¶

func (df *DFcore) Fns() Fns

func (*DFcore) HasColumns ¶

func (df *DFcore) HasColumns(cols ...string) bool

func (*DFcore) KeepColumns ¶

func (df *DFcore) KeepColumns(colNames ...string) error

func (*DFcore) SourceDF ¶

func (df *DFcore) SourceDF() *DFcore

type DFopt ¶

type DFopt func(df DC) error

DFopt functions are used to set DFcore options

func DFappendFn ¶

func DFappendFn(f Fn) DFopt

func DFdialect ¶

func DFdialect(d *Dialect) DFopt

func DFsetFns ¶

func DFsetFns(f Fns) DFopt

func DFsetSourceDF ¶

func DFsetSourceDF(source DC) DFopt

type DataTypes ¶

type DataTypes uint8

DataTypes are the types of data that the package supports for Column elements

const (
	DTfloat DataTypes = 0 + iota
	DTint
	DTstring
	DTdate
	DTcategorical
	DTunknown // keep as last entry, OK to put new entries before
)

Values of DataTypes

func DTFromString ¶

func DTFromString(nm string) DataTypes

DTFromString returns the DataTypes value as given by nm e.g. Input "DTdate", output 3. Fail behavior is to return DTunknown

func GetKind ¶

func GetKind(fn reflect.Type) DataTypes

GetKind maps reflect.Type into d.DataType

func WhatAmI ¶

func WhatAmI(val any) DataTypes

WhatAmI returns the type of val.

func (DataTypes) String ¶

func (i DataTypes) String() string

type Dialect ¶

type Dialect struct {
	// contains filtered or unexported fields
}

Dialect manages interactions with DB's.

func NewDialect ¶

func NewDialect(dialect string, db *sql.DB, opts ...DialectOpt) (*Dialect, error)

NewDialect creates a *Dialect to manage DB access.

func (*Dialect) BufSize ¶

func (d *Dialect) BufSize() int

BufSize returns the buffer size for Insert Values queries.

func (*Dialect) Case ¶

func (d *Dialect) Case(whens, vals []string) (string, error)

Case creates a CASE statement.

whens - slice of conditions
vals  - slice of the value to set the result to if condition is true

func (*Dialect) CastField ¶

func (d *Dialect) CastField(fieldName string, toDT DataTypes) (sqlStr string, err error)

CastField casts fieldName to type toDT.

func (*Dialect) CastFloat ¶

func (d *Dialect) CastFloat() bool

CastFloat says whether floats need to be cast as such. Postgress will return "NUMERIC" for calculated fields which the connector loads as strings

func (*Dialect) Close ¶

func (d *Dialect) Close() error

func (*Dialect) Convert ¶

func (d *Dialect) Convert(val any) any

Convert converts val to the corresponding datatype used by df. assign assigns the indx vector of v to be val

func (*Dialect) Create ¶

func (d *Dialect) Create(tableName, orderBy string, fields []string, types []DataTypes, overwrite, temporary bool, options ...string) error

Create creates a table.

tableName  - name of the table to create
orderBy    - comma-separated list of fields to form the key (order)
fields     - field names
types      - field types
overwrite  - if true, overwrite existing table
temporary  - create a temp table
options    - are in key:value format and are meant to replace placeholders in create.txt

func (*Dialect) DB ¶

func (d *Dialect) DB() *sql.DB

func (*Dialect) DialectName ¶

func (d *Dialect) DialectName() string

func (*Dialect) DropTable ¶

func (d *Dialect) DropTable(tableName string) error

func (*Dialect) Exists ¶

func (d *Dialect) Exists(tableName string) bool

Exists returns true if tableName exists on the db.

func (*Dialect) Functions ¶

func (d *Dialect) Functions() Fmap

Functions returns a map of functions for the parser.

func (*Dialect) Global ¶

func (d *Dialect) Global(sourceSQL, colSQL string) string

Global takes SQL that normally is a scalar return (e.g. count(*), avg(x)) and surrounds it with SQL to return that value for every row of a query.

func (*Dialect) Insert ¶

func (d *Dialect) Insert(tableName, makeQuery, fields string) error

Insert executes an insert query

func (*Dialect) InsertValues ¶

func (d *Dialect) InsertValues(tableName string, values []byte) error

InsertValues inserts values into tableName

func (*Dialect) Interp ¶

func (d *Dialect) Interp(sourceSQL, interpSQL, xSfield, xIfield, yField, outField string) string

Interp executes a query to interpolate values

func (*Dialect) IterSave ¶

func (d *Dialect) IterSave(tableName string, df HasIter) error

IterSave saves the data represented by df into tableName

func (*Dialect) Join ¶

func (d *Dialect) Join(leftSQL, rightSQL string, leftFields, rightFields, joinFields []string) string

Join creates an inner JOIN query.

leftSQL - SQL for left side of join
rightSQL - SQL for right side of join
leftFields - fields to keep from leftSQL
rightFields - fields to keep from rightSQL
joinField - fields to join on

func (*Dialect) Load ¶

func (d *Dialect) Load(qry string) (memData []*Vector, fieldNames []string, fieldTypes []DataTypes, e error)

Load loads qry from a DB into a slice of *Vector.

memData    - returned data
fieldNames - field names of columns
fieldTypes - field types

func (*Dialect) Quantile ¶

func (d *Dialect) Quantile(col string, q float64) string

func (*Dialect) Quote ¶

func (d *Dialect) Quote() string

func (*Dialect) RowCount ¶

func (d *Dialect) RowCount(qry string) (int, error)

func (*Dialect) Rows ¶

func (d *Dialect) Rows(qry string) (rows *sql.Rows, row2Read []any, fieldNames []string, e error)

Rows returns a row reader for qry.

rows       - row reader
row2Read   - a slice with the appropriate types to read the rows.
fieldNames - names of the columns

func (*Dialect) Save ¶

func (d *Dialect) Save(tableName, orderBy string, overwrite, temp bool, toSave HasIter, options ...string) error

Save saves an Iter object to a database.

tableName - name of table to create.
orderBy   - comma-separated list of fields to use as key (order).
overwrite - if true, replace any existing table.
temp      - if true, create a temp table.
toSave    - data to save.
options   - options for CREATE.

func (*Dialect) Seq ¶

func (d *Dialect) Seq(n int) string

Seq returns a query that creates a table with column "seq" whose int values run from 0 to n-1.

func (*Dialect) ToName ¶

func (d *Dialect) ToName(fieldName string) string

ToName converts the raw field name to what's need for a interaction with the database. Specifically, Postgres requires quotes around field names that have uppercase letters

func (*Dialect) ToString ¶

func (d *Dialect) ToString(val any) string

ToString returns a string version of val that can be placed into SQL

func (*Dialect) Types ¶

func (d *Dialect) Types(qry string) (fieldNames []string, fieldTypes []DataTypes, row2read []any, err error)

Types returns info needed to read the data generated by qry.

fieldNames - names of columns qry returns.
fieldTypes - column types returned by qry.
row2Read   - correctly typed row to read for Scan.

func (*Dialect) Union ¶

func (d *Dialect) Union(table1, table2 string, colNames ...string) (string, error)

Union returns a union query between two tables (queries).

func (*Dialect) WithName ¶

func (d *Dialect) WithName() string

WithName returns a random name for use as WITH names, etc.

type DialectOpt ¶

type DialectOpt func(d *Dialect) error

DialectOpt functions are used to set Dialect options

func DialectBuffSize ¶

func DialectBuffSize(bufMB int) DialectOpt

DialectBuffSize sets the buffer size (in MB) for accumulating inserts. Default is 1GB.

func DialectDefaultDate ¶

func DialectDefaultDate(year, mon, day int) DialectOpt

DialectDefaultDate sets the default date to use if a date is null. Default is 1/1/1960.

func DialectDefaultFloat ¶

func DialectDefaultFloat(deflt float64) DialectOpt

DialectDefaultFloat sets the default float to use if an int is null. Default is MaxFloat64.

func DialectDefaultInt ¶

func DialectDefaultInt(deflt int) DialectOpt

DialectDefaultInt sets the default int to use if an int is null. Default is MaxInt.

func DialectDefaultString ¶

func DialectDefaultString(deflt string) DialectOpt

DialectDefaultString sets the default string to use if an int is null. Default is "".

type FileOpt ¶

type FileOpt func(f *Files) error

FileOpt functions are used to set Files options

func FileDateFormat ¶

func FileDateFormat(format string) FileOpt

FileDateFormat sets the format for dates in the file. Default is 20060102.

func FileDefaultDate ¶

func FileDefaultDate(year, mon, day int) FileOpt

FileDefaultDate sets the value to use for fields that fail to convert to date if strict=false. Default is 1/1/1960.

func FileDefaultFloat ¶

func FileDefaultFloat(deflt float64) FileOpt

FileDefaultFloat sets the value to use for fields that fail to convert to float if strict=false. Default is MaxFloat64.

func FileDefaultInt ¶

func FileDefaultInt(deflt int) FileOpt

FileDefaultInt sets the value to use for fields that fail to convert to integer if strict=false. Default is MaxInt.

func FileDefaultString ¶

func FileDefaultString(deflt string) FileOpt

FileDefaultString sets the value to use for fields that fail to convert to string if strict=false. Default is "".

func FileEOL ¶

func FileEOL(eol byte) FileOpt

FileEOL sets the end-of-line character. The default is \n.

func FileFieldNames ¶

func FileFieldNames(fieldNames []string) FileOpt

FileFieldNames sets the field names for the file -- needed if the file has no header.

func FileFieldTypes ¶

func FileFieldTypes(fieldTypes []DataTypes) FileOpt

FileFieldTypes sets the field types for the file--can be used instead of peeking at the file & guessing.

func FileFieldWidths ¶

func FileFieldWidths(fieldWidths []int) FileOpt

FileFieldWidths sets field widths for flat files

func FileFloatFormat ¶

func FileFloatFormat(format string) FileOpt

FileFloatFormat sets the format for writing floats. Default is %.2f.

func FileHeader ¶

func FileHeader(hasHeader bool) FileOpt

FileHeader sets true if file has a header. Default is true.

func FilePeek ¶

func FilePeek(linesToPeek int) FileOpt

FilePeek sets the # of lines to examine to determine data types. Default value of 0 will examine the entire file.

func FileSep ¶

func FileSep(sep byte) FileOpt

FileSep sets the field separator. Default is a comma.

func FileStrict ¶

func FileStrict(strict bool) FileOpt

FileStrict sets the action when a field fails to convert to its expected type.

If true, then an error results.
If false, the default value is substituted.

Default: false

func FileStringDelim ¶

func FileStringDelim(delim byte) FileOpt

FilesStringDelim sets the string delimiter. The default is ".

type Files ¶

type Files struct {
	// contains filtered or unexported fields
}

Files manages interactions with files.

func NewFiles ¶

func NewFiles(opts ...FileOpt) (*Files, error)

NewFiles creates a *Files struct for reading/writing files.

func (*Files) Close ¶

func (f *Files) Close() error

func (*Files) Create ¶

func (f *Files) Create(fileName string) error

Create creates fileName on the file system.

func (*Files) FieldNames ¶

func (f *Files) FieldNames() []string

func (*Files) FieldTypes ¶

func (f *Files) FieldTypes() []DataTypes

func (*Files) FieldWidths ¶

func (f *Files) FieldWidths() []int

func (*Files) Load ¶

func (f *Files) Load() ([]*Vector, error)

Load loads the data into a slice of *Vector.

func (*Files) Open ¶

func (f *Files) Open(fileName string) error

Open opens fileName for reading/writing. It examines the file for consistency with the parameters (e.g has header). If needed, it determines and sets field names and types.

func (*Files) Save ¶

func (f *Files) Save(fileName string, df HasIter) error

Save saves df out to fileName. The file must be created first.

type Fmap ¶

type Fmap map[string]*FnSpec

Fmap maps the function name to its spec

func LoadFunctions ¶

func LoadFunctions(fns string) Fmap

LoadFunctions loads functions from a string which is an embedded file. LoadFunctions expects functions to be separated by "\n" Within each line there are 6 fields separated by colons. The fields are:

function name
function spec
inputs
outputs
return type (C = column, S = scalar)
varying inputs (Y = yes).

Inputs are sets of types with in braces separated by commas.

{int,int},{float,float}

specifies the function takes two parameters which can be either {int,int} or {float,float}.

Corresponding to each set of inputs is an output type. In the above example, if the function always returns a float, the output would be:

float,float.

Legal types are float, int, string and date. Categorical inputs are ints.

If there is no input parameter, leave the field empty as in:

::

type Fn ¶

type Fn func(info bool, df DF, inputs ...Column) *FnReturn

Fn is the function signature for functions called by the parser.

info    - if info == true, then the function is not run but returns *FnReturn with info fields filled in (Name, Output, Inputs, Varying, IsScalar)
df     - DF providing data for function (required only if info=false).
inputs - inputs to the function (required only if info=false).

type FnReturn ¶

type FnReturn struct {
	Value Column // return value of function

	Name string // name of function
	// An element of Inputs is a slice of data types that the function takes as inputs.  For instance,
	//   {DTfloat,DTint}
	// means that the function takes 2 inputs - the first float, the second int.  And
	//   {DTfloat,DTint},{DTfloat,DTfloat}
	// means that the function takes 2 inputs - either float,int or float,float.
	Inputs [][]DataTypes

	// Output types corresponding to the input slices.
	Output []DataTypes

	Varying bool // if true, the number of inputs varies.

	IsScalar bool // if true, the function reduces a column to a scalar (e.g. sum, mean)

	Err error
}

FnReturn is the return type for parser functions

type FnSpec ¶

type FnSpec struct {
	// Name is the name of the function that the parser will recognize in user statements.
	Name string

	// FnDetail gives the specifics of the function.
	// For df/sql, this is the SQL that is run.
	// For df/mem, this is the name of the Go function to call.
	FnDetail string

	// Inputs is a slice that lists all valid combinations of inputs.
	Inputs [][]DataTypes

	// Outputs is a slice that lists the outputs corresponding to each element of Inputs.
	Outputs []DataTypes

	// IsScalar is true if the function reduces a column to a scalar (e.g. mean, sum)
	IsScalar bool

	// Varying is true if the number of inputs can vary.
	Varying bool

	// This is a slice of Go functions to call, corresponding to the elements of inputs/outputs.
	// Not used for df/sql.
	Fns []any
}

FnSpec specifies a function that the parser will have access to.

type Fns ¶

type Fns []Fn

func (Fns) Get ¶

func (fs Fns) Get(fnName string) Fn

type HasIter ¶

type HasIter interface {
	AllRows() iter.Seq2[int, []any]
}

The HasIter interface restricts to types that have an iterator through the rows of the data. Save only requires an iterator to move through the rows

type HasMQdlct ¶

type HasMQdlct interface {
	MakeQuery(colNames ...string) string
	Dialect() *Dialect
}

type HasDQdlct restricts to types that can access a DB

type Scalar ¶

type Scalar struct {
	*ColCore
	// contains filtered or unexported fields
}

Scalar implements Column for scalars.

func NewScalar ¶

func NewScalar(val any, opts ...ColOpt) (*Scalar, error)

func (*Scalar) AllRows ¶

func (s *Scalar) AllRows() iter.Seq2[int, []any]

func (*Scalar) AppendRows ¶

func (s *Scalar) AppendRows(col Column) (Column, error)

func (*Scalar) Copy ¶

func (s *Scalar) Copy() Column

func (*Scalar) Core ¶

func (s *Scalar) Core() *ColCore

func (*Scalar) Data ¶

func (s *Scalar) Data() *Vector

func (*Scalar) Len ¶

func (s *Scalar) Len() int

func (*Scalar) Rename ¶

func (s *Scalar) Rename(newName string) error

func (*Scalar) Replace ¶

func (s *Scalar) Replace(ind, repl Column) (Column, error)

func (*Scalar) String ¶

func (s *Scalar) String() string

type Vector ¶

type Vector struct {
	// contains filtered or unexported fields
}

Vector is the return type for Column data.

func MakeVector ¶

func MakeVector(dt DataTypes, n int) *Vector

MakeVector returns a *Vector with data of type dt and length n.

func NewVector ¶

func NewVector(data any, dt DataTypes) (*Vector, error)

NewVector creates a new *Vector from data, checking/converting that it is of type dt.

func (*Vector) AllRows ¶

func (v *Vector) AllRows() iter.Seq2[int, []any]

AllRows returns an iterator that move through the data. It returns a slice rather than a row so that it's compatible with the DF iterator

func (*Vector) Append ¶

func (v *Vector) Append(data ...any) error

Append appends data (as a slice) to the vector.

func (*Vector) AppendVector ¶

func (v *Vector) AppendVector(vAdd *Vector) error

AppendVector appends a vector.

func (*Vector) AsAny ¶

func (v *Vector) AsAny() any

AsAny returns the data as an any variable.

func (*Vector) AsDate ¶

func (v *Vector) AsDate() ([]time.Time, error)

AsDate returns the data as a time.Time slice. It converts to date, if needed & possible.

func (*Vector) AsFloat ¶

func (v *Vector) AsFloat() ([]float64, error)

AsFloat returns the data as a []float64 slice. It converts to float64, if needed & possible.

func (*Vector) AsInt ¶

func (v *Vector) AsInt() ([]int, error)

AsInt returns the data as a []int slice. It converts to int, if needed & possible.

func (*Vector) AsString ¶

func (v *Vector) AsString() ([]string, error)

AsString returns the data as a string, converting if needed.

func (*Vector) Copy ¶

func (v *Vector) Copy() *Vector

Copy copies the *Vector

func (*Vector) Element ¶

func (v *Vector) Element(indx int) any

Element returns the indx'th element of Vector. It returns nil if indx is out of bounds if v.Len() > 1. If v.Len() = 1, then returns the 0th element. This is needed for the parser when we have an op like "x/2" and we don't want to append a vector of 2's.

func (*Vector) ElementDate ¶

func (v *Vector) ElementDate(indx int) (*time.Time, error)

ElementDate returns the indx'th element as a date, converting the value, if needed & possible.

func (*Vector) ElementFloat ¶

func (v *Vector) ElementFloat(indx int) (*float64, error)

ElementFloat returns the indx'th element as a float64, converting the value, if needed & possible.

func (*Vector) ElementInt ¶

func (v *Vector) ElementInt(indx int) (*int, error)

ElementInt returns the indx'th element as a int, converting the value, if needed & possible.

func (*Vector) ElementString ¶

func (v *Vector) ElementString(indx int) (*string, error)

ElementString returns the indx'th element as a string.

func (*Vector) Len ¶

func (v *Vector) Len() int

Len is the length of the *Vector

func (*Vector) Less ¶

func (v *Vector) Less(i, j int) bool

Less returns true if element i < element j

func (*Vector) SetAny ¶

func (v *Vector) SetAny(val any, indx int)

SetAny sets the indx'th element to val. Does no error checking.

func (*Vector) SetDate ¶

func (v *Vector) SetDate(val time.Time, indx int) error

SetDate sets the indx'th element to val. Does not attempt conversion.

func (*Vector) SetFloat ¶

func (v *Vector) SetFloat(val float64, indx int) error

SetFloat sets the indx'th element to val. Does not attempt conversion.

func (*Vector) SetInt ¶

func (v *Vector) SetInt(val, indx int) error

SetInt sets the indx'th element to val. Does not attempt conversion.

func (*Vector) SetString ¶

func (v *Vector) SetString(val string, indx int) error

SetString sets the indx'th element to val. Does not attempt conversion.

func (*Vector) String ¶

func (v *Vector) String() string

func (*Vector) Swap ¶

func (v *Vector) Swap(i, j int)

Swap swaps the ith and jth element of *Vector

func (*Vector) VectorType ¶

func (v *Vector) VectorType() DataTypes

func (*Vector) Where ¶

func (v *Vector) Where(indic *Vector) *Vector

Where creates a new *Vector with elements from the original *Vector in which indic is greater than 0. indic must be type DTint.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
mem
sql
testing

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL