gambas

package module
v0.2.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 9, 2022 License: BSD-3-Clause Imports: 18 Imported by: 0

README

gambas


gambas is a data analysis package for Go that provides an intuitive way to manipulate tabular data. The project is inspired by the famous Python library pandas.

Installation


$ go get -u github.com/jpoly1219/gambas

Documentation


The documentation can be found in our docs page. We also have a link to the pkg.go.dev page.

Project Goals


  • Provide basic features from the pandas tutorial.
    • Providing Series and DataFrame data types
    • Reading and writing tabular data
      • Reading CSV files
      • Writing to CSV files
      • Reading Excel files
      • Writing to Excel files
      • Reading JSON files
      • Writing to JSON files
    • Selecting a subset of data
      • At, IAt
      • Loc, ILoc
      • Easier filtering (close to that of SQL)
    • Plotting
      • Set
      • Using
      • Trendline (fit)
      • Statistics
      • Categorical count
    • Creating new columns derived from existing columns
      • Creating new columns
      • Applying operations to the new column
      • Renaming columns
    • Calculating summary statistics
      • Mean, median, standard deviation
      • Min, max, quartiles
      • Count, describe
    • Reshaping the layout of tables
      • Sorting by index
      • Sorting by values
      • Sorting by given index
      • Groupby
      • Pivot (long to wide format)
      • PivotTable (long to wide format)
      • Melt (wide to long format)
    • Combining data from multiple tables
      • Concatenate
      • Merge
    • Handling time series data
      • Timestamp type
      • Timestamp type methods
      • ToDatetime
    • Manipulating textual data
    • Multiindex
  • pkg.go.dev page
  • Documentation
  • Project website
  • Project logo

Philosophy


gambas was created to serve the needs of Go developers who wanted a robust data analysis package. pandas is an amazing tool, and is considered the industry standard when it comes to data organization and manipulation.

We didn't have a solid alternative in the Go realm. According to the Go Developer Survey 2021 Results, missing critical libraries were one of the most common barriers to using Go. You may have used Go for some time now, but you might've missed some of the libraries you used when you were using Python. gambas aims to scratch that itch. You will be able to tap into the superpowers of pandas while using your favorite language Go.

Go is a very attractive language with a very loyal userbase. It provides a pleasant developer experience with its simple syntax and strong typing. However, Go currently tends to be skewed towards developing services. 49% of projects written in Go are API/RPC services, and another 10% are for web services. The ultimate goal for gambas is to allow the Go programming language to be a major player in the data analysis field.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Fit added in v0.2.1

func Fit(ff string, pd PlotData, viaOpts ...GnuplotOpt) error

Fit fits a user-defined function ff to data given in PlotData pd, and prints out the results.

Pass options such as `using` in pd, but `via` in viaOpts.

func Plot added in v0.2.1

func Plot(pd PlotData, setOpts ...GnuplotOpt) error

Plot plots a set of data given by the PlotData object `pd`.

Pass in any `set` options you need. Refer to the gnuplot documentation for `set` options.

func PlotN added in v0.2.1

func PlotN(plotdata []PlotData, setOpts ...GnuplotOpt) error

PlotN plots several PlotData objects `pd` in one graph. Use PlotN when you want to compare two different datasets, or a dataset with a line of best fit.

Refer to the gnuplot documentation for `set` options.

func WriteCsv

func WriteCsv(df DataFrame, pathToFile string, skipColumnLabel bool) (os.FileInfo, error)

WriteCsv writes a DataFrame object to CSV file. It is recommended to generate pathToFile using `filepath.Join`.

func WriteExcel added in v0.1.0

func WriteExcel(df DataFrame, pathToFile string) (os.FileInfo, error)

WriteExcel writes a DataFrame object into an Excel file.

func WriteJson

func WriteJson(df DataFrame, pathToFile string) (os.FileInfo, error)

WriteJson writes a DataFrame object to a file.

Types

type DataFrame

type DataFrame struct {
	// contains filtered or unexported fields
}

DataFrame type represents a 2D tabular dataset. A DataFrame object is comprised of multiple Series objects.

func NewDataFrame

func NewDataFrame(data [][]interface{}, columns []string, indexCols []string) (DataFrame, error)

NewDataFrame created a new DataFrame object from given parameters. Generally, NewDataFrameFromFile will be used more often.

func ReadCsv

func ReadCsv(pathToFile string, indexCols []string) (DataFrame, error)

ReadCsv reads a CSV file and returns a new DataFrame object. It is recommended to generate pathToFile using `filepath.Join`.

func ReadExcel added in v0.1.0

func ReadExcel(pathToFile, sheetName string, axis int) (DataFrame, error)

ReadExcel reads an excel file and converts it to a DataFrame object. The axis depends on the layout of the data. Row-based data where each group represents a row will have an axis=0. Column-based data where each group represents a column will have an axis=1.

func ReadJsonByColumns

func ReadJsonByColumns(pathToFile string, indexCols []string) (DataFrame, error)

ReadJson reads a JSON file and returns a new DataFrame object. It is recommended to generate pathToFile using `filepath.Join`. The JSON file should be in this format: {"col1":[val1, val2, ...], "col2":[val1, val2, ...], ...} You can either set a column to be the index, or set it as nil. If nil, a new RangeIndex will be created. Your index column should not have any missing values. Order of columns is not guaranteed, but the index column will always come first.

func ReadJsonStream

func ReadJsonStream(pathToFile string, indexCols []string) (DataFrame, error)

ReadJsonStream reads a JSON stream and returns a new DataFrame object. The JSON file should be in this format: {"col1":val1, "col2":val2, ...}{"col1":val1, "col2":val2, ...}

func (*DataFrame) ColAdd

func (df *DataFrame) ColAdd(colname string, value float64) (DataFrame, error)

ColAdd adds the given value to each element in the specified column.

func (*DataFrame) ColDiv

func (df *DataFrame) ColDiv(colname string, value float64) (DataFrame, error)

ColDiv divides each element in the specified column by the given value.

func (*DataFrame) ColEq

func (df *DataFrame) ColEq(colname string, value float64) (DataFrame, error)

ColEq checks if each element in the specified column is equal to the given value.

func (*DataFrame) ColGt

func (df *DataFrame) ColGt(colname string, value float64) (DataFrame, error)

ColGt checks if each element in the specified column is greater than the given value.

func (*DataFrame) ColLt

func (df *DataFrame) ColLt(colname string, value float64) (DataFrame, error)

ColLt checks if each element in the specified column is less than the given value.

func (*DataFrame) ColMod

func (df *DataFrame) ColMod(colname string, value float64) (DataFrame, error)

ColMod applies modulus calculations on each element in the specified column, returning the remainder.

func (*DataFrame) ColMul

func (df *DataFrame) ColMul(colname string, value float64) (DataFrame, error)

ColMul multiplies each element in the specified column by the given value.

func (*DataFrame) ColSub

func (df *DataFrame) ColSub(colname string, value float64) (DataFrame, error)

ColSub subtracts the given value from each element in the specified column.

func (DataFrame) Columns added in v0.2.2

func (df DataFrame) Columns() []string

func (*DataFrame) DropNaN

func (df *DataFrame) DropNaN(axis int) (DataFrame, error)

DropNaN drops rows or columns with NaN values. Specify axis to choose whether to remove rows with NaN or columns with NaN. axis=0 is row, axis=1 is column.

func (*DataFrame) GroupBy

func (df *DataFrame) GroupBy(by ...string) (GroupBy, error)

GroupBy groups selected columns in a DataFrame object and returns a GroupBy object.

func (*DataFrame) Head

func (df *DataFrame) Head(howMany int)

Head prints the first howMany items in a DataFrame object.

func (DataFrame) Index added in v0.2.2

func (df DataFrame) Index() IndexData

func (*DataFrame) Loc

func (df *DataFrame) Loc(cols []string, rows ...[]interface{}) (DataFrame, error)

Loc indexes the DataFrame object given a slice of row and column labels, and returns the result as a new DataFrame object. You are only allowed to pass in indices of the DataFrame as rows.

func (*DataFrame) LocCol added in v0.2.2

func (df *DataFrame) LocCol(col string) (Series, error)

LocCol returns a column as a new Series object.

func (*DataFrame) LocCols

func (df *DataFrame) LocCols(cols ...string) (DataFrame, error)

LocCols returns a set of columns as a new DataFrame object, given a list of labels.

func (*DataFrame) LocColsItems

func (df *DataFrame) LocColsItems(cols ...string) ([][]interface{}, error)

LocColsItems will return a slice of columns. Use this over LocCols if you want to extract the items directly instead of getting a DataFrame object.

func (*DataFrame) LocRows

func (df *DataFrame) LocRows(rows ...[]interface{}) (DataFrame, error)

LocRows returns a set of rows as a new DataFrame object, given a list of labels. You are only allowed to pass in the indices of the DataFrame as rows.

func (*DataFrame) LocRowsItems

func (df *DataFrame) LocRowsItems(rows ...[]interface{}) ([][]interface{}, error)

LocRowsItems will return a slice of rows. You are only allowed to pass in indices of the DataFrame as rows. Use this over LocRows if you want to extract the items directly instead of getting a DataFrame object.

func (*DataFrame) MarshalJSON

func (df *DataFrame) MarshalJSON() ([]byte, error)

MarshalJSON is used to implement the json.Marshaler interface{}.

func (*DataFrame) Melt

func (df *DataFrame) Melt(colName, valueName string) (DataFrame, error)

Melt returns the table from wide to long format. Use Melt to revert to pre-Pivot format.

func (*DataFrame) MergeDfsHorizontally added in v0.1.0

func (df *DataFrame) MergeDfsHorizontally(target DataFrame) (DataFrame, error)

MergeDfsHorizontally merges two DataFrame objects side by side. The target DataFrame will always be appended to the right of the source DataFrame. Index will reset and become a RangeIndex.

func (*DataFrame) MergeDfsVertically added in v0.1.0

func (df *DataFrame) MergeDfsVertically(target DataFrame) (DataFrame, error)

MergeDfsVertically stacks two DataFrame objects vertically.

func (*DataFrame) NewCol

func (df *DataFrame) NewCol(colname string, data []interface{}) (DataFrame, error)

NewCol creates a new column with the given data and column name. To create a blank column, pass in nil.

func (*DataFrame) NewDerivedCol

func (df *DataFrame) NewDerivedCol(colname, srcCol string) (DataFrame, error)

NewDerivedCol creates a new column derived from an existing column. It copies over the data from srcCol into a new column.

func (*DataFrame) Pivot

func (df *DataFrame) Pivot(column, value string) (DataFrame, error)

Pivot returns an organized Dataframe that has values corresponding to the index and the given column.

func (*DataFrame) PivotTable

func (df *DataFrame) PivotTable(index, column, value string, aggFunc StatsFunc) (DataFrame, error)

PivotTable rearranges the data by a given index and column. Each value will be aggregated via an aggregation function. Pick three columns from the DataFrame, each to serve as the index, column, and value. PivotTable ignores NaN values.

func (*DataFrame) Print

func (df *DataFrame) Print()

Print prints all data in a DataFrame object.

func (*DataFrame) PrintRange

func (df *DataFrame) PrintRange(start, end int)

PrintRange prints data in a DataFrame object at a given range. Index starts at 0.

func (*DataFrame) RenameCol

func (df *DataFrame) RenameCol(colnames map[string]string) error

RenameCol renames columns in a DataFrame.

func (DataFrame) Series added in v0.2.2

func (df DataFrame) Series() []Series

func (*DataFrame) SortByColumns

func (df *DataFrame) SortByColumns()

SortByColumns sorts the columns of the DataFrame object.

func (*DataFrame) SortByIndex

func (df *DataFrame) SortByIndex(ascending bool) error

SortByIndex sorts the items by index.

func (*DataFrame) SortByValues

func (df *DataFrame) SortByValues(by string, ascending bool) error

SortByValues sorts the items by values in a selected Series.

func (*DataFrame) SortIndexColFirst

func (df *DataFrame) SortIndexColFirst()

SortIndexColFirst puts the index column at the front.

func (*DataFrame) Tail

func (df *DataFrame) Tail(howMany int)

Tail prints the last howMany items in a DataFrame object.

type GnuplotOpt added in v0.2.0

type GnuplotOpt interface {
	// contains filtered or unexported methods
}

A GnuplotOpt represents an option in gnuplot.

func Setangles added in v0.2.2

func Setangles(value string) GnuplotOpt

func Setarrow added in v0.2.0

func Setarrow(value string) GnuplotOpt

func Setautoscale added in v0.2.0

func Setautoscale(value string) GnuplotOpt

func Setbmargin added in v0.2.0

func Setbmargin(value string) GnuplotOpt

func Setborder added in v0.2.0

func Setborder(value string) GnuplotOpt

func Setboxdepth added in v0.2.0

func Setboxdepth(value string) GnuplotOpt

func Setboxwidth added in v0.2.0

func Setboxwidth(value string) GnuplotOpt

func Setcbdata added in v0.2.0

func Setcbdata(value string) GnuplotOpt

func Setcbdtics added in v0.2.0

func Setcbdtics(value string) GnuplotOpt

func Setcblabel added in v0.2.0

func Setcblabel(value string) GnuplotOpt

func Setcbmtics added in v0.2.0

func Setcbmtics(value string) GnuplotOpt

func Setcbrange added in v0.2.0

func Setcbrange(value string) GnuplotOpt

func Setcbtics added in v0.2.0

func Setcbtics(value string) GnuplotOpt

func Setclip added in v0.2.0

func Setclip(value string) GnuplotOpt

func Setcntrlabel added in v0.2.0

func Setcntrlabel(value string) GnuplotOpt

func Setcntrparam added in v0.2.0

func Setcntrparam(value string) GnuplotOpt

func Setcolor added in v0.2.0

func Setcolor() GnuplotOpt

func Setcolorbox added in v0.2.0

func Setcolorbox(value string) GnuplotOpt

func Setcolormap added in v0.2.0

func Setcolormap(value string) GnuplotOpt

func Setcolorsequence added in v0.2.0

func Setcolorsequence(value string) GnuplotOpt

func Setcontour added in v0.2.0

func Setcontour(value string) GnuplotOpt

func Setdashtype added in v0.2.0

func Setdashtype(value string) GnuplotOpt

func Setdatafile added in v0.2.0

func Setdatafile(value string) GnuplotOpt

func Setdecimalsign added in v0.2.0

func Setdecimalsign(value string) GnuplotOpt

func Setdgrid3d added in v0.2.0

func Setdgrid3d(value string) GnuplotOpt

func Setdummy added in v0.2.0

func Setdummy(value string) GnuplotOpt

func Setencoding added in v0.2.0

func Setencoding(value string) GnuplotOpt

func Seterrorbars added in v0.2.0

func Seterrorbars(value string) GnuplotOpt

func Setfit added in v0.2.0

func Setfit(value string) GnuplotOpt

func Setfontpath added in v0.2.0

func Setfontpath(value string) GnuplotOpt

func Setformat added in v0.2.0

func Setformat(value string) GnuplotOpt

func Setgrid added in v0.2.0

func Setgrid(value string) GnuplotOpt

func Sethidden3d added in v0.2.0

func Sethidden3d(value string) GnuplotOpt

func Sethistory added in v0.2.0

func Sethistory(value string) GnuplotOpt

func Sethistorysize added in v0.2.0

func Sethistorysize(value string) GnuplotOpt

func Setisosamples added in v0.2.0

func Setisosamples(value string) GnuplotOpt

func Setisosurface added in v0.2.0

func Setisosurface(value string) GnuplotOpt

func Setisotropic added in v0.2.0

func Setisotropic() GnuplotOpt

func Setjitter added in v0.2.0

func Setjitter(value string) GnuplotOpt

func Setkey added in v0.2.0

func Setkey(value string) GnuplotOpt

func Setlabel added in v0.2.0

func Setlabel(value string) GnuplotOpt

func Setlinetype added in v0.2.1

func Setlinetype(value string) GnuplotOpt
func Setlink(value string) GnuplotOpt

func Setlmargin added in v0.2.0

func Setlmargin(value string) GnuplotOpt

func Setloadpath added in v0.2.0

func Setloadpath(value string) GnuplotOpt

func Setlocale added in v0.2.0

func Setlocale(value string) GnuplotOpt

func Setlogscale added in v0.2.0

func Setlogscale(value string) GnuplotOpt

func Setmapping added in v0.2.0

func Setmapping(value string) GnuplotOpt

func Setmicro added in v0.2.0

func Setmicro(value string) GnuplotOpt

func Setminussign added in v0.2.0

func Setminussign(value string) GnuplotOpt

func Setmonochrome added in v0.2.0

func Setmonochrome(value string) GnuplotOpt

func Setmouse added in v0.2.0

func Setmouse(value string) GnuplotOpt

func Setmttics added in v0.2.0

func Setmttics(value string) GnuplotOpt

func Setmultiplot added in v0.2.0

func Setmultiplot(value string) GnuplotOpt

func Setmx2tics added in v0.2.0

func Setmx2tics(value string) GnuplotOpt

func Setmy2tics added in v0.2.0

func Setmy2tics(value string) GnuplotOpt

func Setmytics added in v0.2.0

func Setmytics(value string) GnuplotOpt

func Setmztics added in v0.2.0

func Setmztics(value string) GnuplotOpt

func Setnonlinear added in v0.2.0

func Setnonlinear(value string) GnuplotOpt

func Setobject added in v0.2.0

func Setobject(value string) GnuplotOpt

func Setoffsets added in v0.2.0

func Setoffsets(value string) GnuplotOpt

func Setorigin added in v0.2.0

func Setorigin(value string) GnuplotOpt

func Setoutput added in v0.2.0

func Setoutput(value string) GnuplotOpt

func Setoverflow added in v0.2.0

func Setoverflow(value string) GnuplotOpt

func Setpalette added in v0.2.0

func Setpalette(value string) GnuplotOpt

func Setparametric added in v0.2.0

func Setparametric(value string) GnuplotOpt

func Setpaxis added in v0.2.0

func Setpaxis(value string) GnuplotOpt

func Setpixmap added in v0.2.0

func Setpixmap(value string) GnuplotOpt

func Setpm3d added in v0.2.0

func Setpm3d(value string) GnuplotOpt

func Setpointintervalbox added in v0.2.0

func Setpointintervalbox() GnuplotOpt

func Setpointsize added in v0.2.0

func Setpointsize(value string) GnuplotOpt

func Setpolar added in v0.2.0

func Setpolar() GnuplotOpt

func Setprint added in v0.2.0

func Setprint(value string) GnuplotOpt

func Setpsdir added in v0.2.0

func Setpsdir(value string) GnuplotOpt

func Setraxis added in v0.2.0

func Setraxis() GnuplotOpt

func Setrgbmax added in v0.2.0

func Setrgbmax(value string) GnuplotOpt

func Setrlabel added in v0.2.0

func Setrlabel(value string) GnuplotOpt

func Setrmargin added in v0.2.0

func Setrmargin(value string) GnuplotOpt

func Setrrange added in v0.2.0

func Setrrange(value string) GnuplotOpt

func Setrtics added in v0.2.0

func Setrtics(value string) GnuplotOpt

func Setsamples added in v0.2.0

func Setsamples(value string) GnuplotOpt

func Setsize added in v0.2.0

func Setsize(value string) GnuplotOpt

func Setspiderplot added in v0.2.0

func Setspiderplot() GnuplotOpt

func Setstyle added in v0.2.0

func Setstyle(value string) GnuplotOpt

func Setsurface added in v0.2.0

func Setsurface(value string) GnuplotOpt

func Settable added in v0.2.0

func Settable(value string) GnuplotOpt

func Setterminal added in v0.2.0

func Setterminal(value string) GnuplotOpt

func Settermoption added in v0.2.0

func Settermoption(value string) GnuplotOpt

func Settheta added in v0.2.0

func Settheta(value string) GnuplotOpt

func Settics added in v0.2.0

func Settics(value string) GnuplotOpt

func Settimefmt added in v0.2.1

func Settimefmt(value string) GnuplotOpt

func Settimestamp added in v0.2.0

func Settimestamp(value string) GnuplotOpt

func Settitle added in v0.2.0

func Settitle(value string) GnuplotOpt

func Settmargin added in v0.2.0

func Settmargin(value string) GnuplotOpt

func Settrange added in v0.2.0

func Settrange(value string) GnuplotOpt

func Setttics added in v0.2.0

func Setttics(value string) GnuplotOpt

func Seturange added in v0.2.0

func Seturange(value string) GnuplotOpt

func Setvgrid added in v0.2.0

func Setvgrid(value string) GnuplotOpt

func Setview added in v0.2.0

func Setview(value string) GnuplotOpt

func Setvrange added in v0.2.0

func Setvrange(value string) GnuplotOpt

func Setvxrange added in v0.2.0

func Setvxrange(value string) GnuplotOpt

func Setvyrange added in v0.2.0

func Setvyrange(value string) GnuplotOpt

func Setvzrange added in v0.2.0

func Setvzrange(value string) GnuplotOpt

func Setwalls added in v0.2.0

func Setwalls(value string) GnuplotOpt

func Setx2ata added in v0.2.0

func Setx2ata(value string) GnuplotOpt

func Setx2dtics added in v0.2.0

func Setx2dtics(value string) GnuplotOpt

func Setx2label added in v0.2.0

func Setx2label(value string) GnuplotOpt

func Setx2mtics added in v0.2.0

func Setx2mtics(value string) GnuplotOpt

func Setx2range added in v0.2.0

func Setx2range(value string) GnuplotOpt

func Setx2tics added in v0.2.0

func Setx2tics(value string) GnuplotOpt

func Setx2zeroaxis added in v0.2.0

func Setx2zeroaxis(value string) GnuplotOpt

func Setxdata added in v0.2.1

func Setxdata(value string) GnuplotOpt

func Setxdtics added in v0.2.0

func Setxdtics(value string) GnuplotOpt

func Setxlabel added in v0.2.0

func Setxlabel(value string) GnuplotOpt

func Setxmtics added in v0.2.0

func Setxmtics(value string) GnuplotOpt

func Setxrange added in v0.2.0

func Setxrange(value string) GnuplotOpt

func Setxtics added in v0.2.0

func Setxtics(value string) GnuplotOpt

func Setxyplane added in v0.2.0

func Setxyplane(value string) GnuplotOpt

func Setxzeroaxis added in v0.2.0

func Setxzeroaxis(value string) GnuplotOpt

func Sety2data added in v0.2.0

func Sety2data(value string) GnuplotOpt

func Sety2dtics added in v0.2.0

func Sety2dtics(value string) GnuplotOpt

func Sety2label added in v0.2.0

func Sety2label(value string) GnuplotOpt

func Sety2mtics added in v0.2.0

func Sety2mtics(value string) GnuplotOpt

func Sety2range added in v0.2.0

func Sety2range(value string) GnuplotOpt

func Sety2tics added in v0.2.0

func Sety2tics(value string) GnuplotOpt

func Sety2zeroaxis added in v0.2.0

func Sety2zeroaxis(value string) GnuplotOpt

func Setydata added in v0.2.1

func Setydata(value string) GnuplotOpt

func Setydtics added in v0.2.0

func Setydtics(value string) GnuplotOpt

func Setylabel added in v0.2.0

func Setylabel(value string) GnuplotOpt

func Setymtics added in v0.2.0

func Setymtics(value string) GnuplotOpt

func Setyrange added in v0.2.0

func Setyrange(value string) GnuplotOpt

func Setytics added in v0.2.0

func Setytics(value string) GnuplotOpt

func Setyzeroaxis added in v0.2.0

func Setyzeroaxis(value string) GnuplotOpt

func Setzdata added in v0.2.0

func Setzdata(value string) GnuplotOpt

func Setzdtics added in v0.2.0

func Setzdtics(value string) GnuplotOpt

func Setzero added in v0.2.0

func Setzero(value string) GnuplotOpt

func Setzlabel added in v0.2.0

func Setzlabel(value string) GnuplotOpt

func Setzmtics added in v0.2.0

func Setzmtics(value string) GnuplotOpt

func Setzrange added in v0.2.0

func Setzrange(value string) GnuplotOpt

func Setztics added in v0.2.0

func Setztics(value string) GnuplotOpt

func Setzzeroaxis added in v0.2.0

func Setzzeroaxis(value string) GnuplotOpt

func Unsetcornerpoles added in v0.2.0

func Unsetcornerpoles() GnuplotOpt

func Using added in v0.2.0

func Using(value string) GnuplotOpt

func Via added in v0.2.1

func Via(value string) GnuplotOpt

func With added in v0.2.1

func With(value string) GnuplotOpt

type GroupBy

type GroupBy struct {
	// contains filtered or unexported fields
}

GroupBy type is a intermediary struct that is created after running DataFrame.GroupBy(). It holds the necessary data for applying operations such as GroupBy.Agg().

func (*GroupBy) Agg

func (gb *GroupBy) Agg(targetCol []string, aggFunc StatsFunc) (DataFrame, error)

Agg aggregates data in the GroupBy object using the given aggFunc.

type Index

type Index struct {
	// contains filtered or unexported fields
}

Index stores the index values of a series and dataframe. The 0th element must be the ID of the index. For example, if your data includes a column of names that you have set to be the index, the index may look like this: Index{0, "Alice"}, Index{1, "Bob"}, Index{2, "Charlie"}. Index{} with more than one value (not including the ID) is considered a multi-index.

func (Index) Id added in v0.2.2

func (i Index) Id() int

func (Index) Value added in v0.2.2

func (i Index) Value() []interface{}

type IndexData

type IndexData struct {
	// contains filtered or unexported fields
}

IndexData type is used to hold index information of a Series or a DataFrame.

func CreateRangeIndex

func CreateRangeIndex(length int) IndexData

CreateRangeIndex takes the length of an Index and creates a RangeIndex. RangeIndex is an index that spans from 0 to the length of the index.

func NewIndexData added in v0.2.2

func NewIndexData(index [][]interface{}, names []string) (IndexData, error)

NewIndexData creates a new IndexData object.

func (IndexData) Index added in v0.2.2

func (id IndexData) Index() []Index

func (IndexData) Len

func (id IndexData) Len() int

Len is used to implement the sort.Sort interface.

func (IndexData) Less

func (id IndexData) Less(i, j int) bool

Less is used to implement the sort.Sort interface.

func (IndexData) Names added in v0.2.2

func (id IndexData) Names() []string

func (IndexData) Swap

func (id IndexData) Swap(i, j int)

Swap is used to implement the sort.Sort interface.

type PlotData added in v0.2.1

type PlotData struct {
	// Df is the DataFrame object you would like to plot.
	Df *DataFrame

	// Columns are the columns in Df that you want to plot. Usually, it's a pair of columns [xcol, ycol].
	// If you want to create a bar graph or a histogram, you can add more columns.
	Columns []string

	// Function is an arbitrary function such as sin(x) or an equation of the line of best fit.
	Function string

	// Opts are options such as `using` or `with`. `set` is passed in as an argument for other plotting functions.
	Opts []GnuplotOpt
}

A PlotData holds the data required for plotting.

If you want to plot an arbitrary function, leave Df and Columns as nil. Otherwise, populate Df and Columns, and leave Function as "".

type Series

type Series struct {
	// contains filtered or unexported fields
}

A Series represents a column of data.

func NewSeries

func NewSeries(data []interface{}, name string, index *IndexData) (Series, error)

NewSeries created a new Series object from given parameters. Generally, NewSeriesFromFile will be used more often. The index parameter can be set to nil when calling NewSeries on its own. This field is for passing in the DataFrame's index data in NewDataFrame.

func (*Series) At

func (s *Series) At(ind ...interface{}) (interface{}, error)

At returns an element at a given index. For multiindex, you need to pass in the whole index tuple.

func (*Series) Count

func (s *Series) Count() StatsResult

Count counts the number of non-NaN elements in a column.

func (Series) Data added in v0.2.2

func (s Series) Data() []interface{}

func (*Series) Describe

func (s *Series) Describe() ([]StatsResult, error)

Describe runs through the most commonly used statistics functions and prints the output.

func (Series) Dtype added in v0.2.2

func (s Series) Dtype() string

func (*Series) Head

func (s *Series) Head(howMany int)

Head prints the first howMany items in a Series object.

func (*Series) IAt

func (s *Series) IAt(ind int) (interface{}, error)

IAt returns an element at a given integer index.

func (*Series) ILoc

func (s *Series) ILoc(min, max int) ([]interface{}, error)

ILoc returns an array of elements at a given integer index range.

func (Series) Index added in v0.2.2

func (s Series) Index() IndexData

func (*Series) IndexHasDuplicateValues

func (s *Series) IndexHasDuplicateValues() (bool, error)

IndexHasDuplicateValues checks if the Series have duplicate index values.

func (Series) Len

func (s Series) Len() int

Len is used to implement the sort.Sort interface.

func (Series) Less

func (s Series) Less(i, j int) bool

Less is used to implement the sort.Sort interface.

func (*Series) Loc

func (s *Series) Loc(idx ...[]interface{}) (Series, error)

Loc accepts index tuples and returns a Series object containing data at the given rows. Each idx item should contain the index of the data you would like to query. For multiindex Series, you can either pass in the whole index tuple, or the first index.

func (*Series) LocItems

func (s *Series) LocItems(idx ...[]interface{}) ([]interface{}, error)

LocItems acts the exact same as Loc, but returns data as []interface{} instead of Series.

func (*Series) Max

func (s *Series) Max() StatsResult

Max returns the largest element is a column.

func (*Series) Mean

func (s *Series) Mean() StatsResult

Mean returns the mean of the elements in a column.

func (*Series) Median

func (s *Series) Median() StatsResult

Median returns the median of the elements in a column.

func (*Series) Min

func (s *Series) Min() StatsResult

Min returns the smallest element in a column.

func (Series) Name added in v0.2.2

func (s Series) Name() string

func (*Series) Print

func (s *Series) Print()

Print prints all data in a Series object.

func (*Series) PrintRange

func (s *Series) PrintRange(start, end int)

PrintRange prints data in a Series object at a given range. Index starts at 0.

func (*Series) Q1

func (s *Series) Q1() StatsResult

Q1 returns the lower quartile (25%) of the elements in a column. This does not include the median during calculation.

func (*Series) Q2

func (s *Series) Q2() StatsResult

Q2 returns the middle quartile (50%) of the elements in a column. This accomplishes the same thing as Median.

func (*Series) Q3

func (s *Series) Q3() StatsResult

Q3 returns the upper quartile (75%) of the elements in a column. This does not include the median during calculation.

func (*Series) RenameCol

func (s *Series) RenameCol(newName string)

RenameCol renames the series.

func (*Series) RenameIndex

func (s *Series) RenameIndex(newNames map[string]string) error

RenameIndex renames the index of the series. Input should be a map, where key is the index name to change and value is a new name.

func (*Series) SortByGivenIndex

func (s *Series) SortByGivenIndex(index IndexData, withId bool) error

SortByGivenIndex sorts the Series by a given index.

func (*Series) SortByIndex

func (s *Series) SortByIndex(ascending bool) error

SortByIndex sorts the elements in a Series by index. Pass in true if you want to sort in ascending order, and false for descending order.

func (*Series) SortByValues

func (s *Series) SortByValues(ascending bool) error

SortByValues sorts the Series by its values. Pass in true if you want to sort in ascending order, and false for descending order.

func (*Series) Std

func (s *Series) Std() StatsResult

Std returns the sample standard deviation of the elements in a column.

func (Series) Swap

func (s Series) Swap(i, j int)

Swap is used to implement the sort.Sort interface.

func (*Series) Tail

func (s *Series) Tail(howMany int)

Tail prints the last howMany items in a Series object.

func (*Series) ValueCounts

func (s *Series) ValueCounts() (Series, error)

ValueCounts returns a Series containing the number of unique values in a given Series.

type StatsFunc

type StatsFunc func(dataset []interface{}) StatsResult

StatsFunc represents any function that accepts dataset as input and returns StatsResult as output.

type StatsResult

type StatsResult struct {
	UsedFunc string
	Result   float64
	Err      error
}

StatsResult holds the results of calculation from a statistics function such as Mean or Median.

func Count

func Count(dataset []interface{}) StatsResult

Count counts the number of non-NaN elements in a dataset.

func Max

func Max(dataset []interface{}) StatsResult

Max returns the largest element is a dataset.

func Mean

func Mean(dataset []interface{}) StatsResult

Mean returns the mean of the elements in a dataset.

func Median

func Median(dataset []interface{}) StatsResult

Median returns the median of the elements in a dataset.

func Min

func Min(dataset []interface{}) StatsResult

Min returns the smallest element in a dataset.

func Q1

func Q1(dataset []interface{}) StatsResult

Q1 returns the lower quartile (25%) of the elements in a dataset. This does not include the median during calculation.

func Q2

func Q2(dataset []interface{}) StatsResult

Q2 returns the middle quartile (50%) of the elements in a dataset. This accomplishes the same thing as Median.

func Q3

func Q3(dataset []interface{}) StatsResult

Q3 returns the upper quartile (75%) of the elements in a dataset. This does not include the median during calculation.

func Std

func Std(dataset []interface{}) StatsResult

Std returns the sample standard deviation of the elements in a dataset.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL