Documentation
¶
Index ¶
- func Concat(objs []*dataframe.DataFrame, opts ...ConcatOptions) (*dataframe.DataFrame, error)
- func FloatColumn(col []any) ([]float64, error)
- func NewDataFrameFromSeries(columns map[string]collection.Series, columnOrder []string) (*dataframe.DataFrame, error)
- func NewEmptyDataFrame(columns []string, columnTypes map[string]reflect.Type) *dataframe.DataFrame
- type BoolCol
- type Column
- type ConcatAxis
- type ConcatJoin
- type ConcatOptions
- type DbConfig
- type FloatCol
- type GoPandas
- func (GoPandas) DataFrame(columns []string, data []Column, columns_types map[string]any) (*dataframe.DataFrame, error)
- func (GoPandas) From_gbq(query string, projectID string) (*dataframe.DataFrame, error)
- func (GoPandas) Read_csv(filepath string) (*dataframe.DataFrame, error)
- func (gp GoPandas) Read_csv_typed(filepath string, columnTypes map[string]any) (*dataframe.DataFrame, error)
- func (GoPandas) Read_sql(query string, db_config DbConfig) (*dataframe.DataFrame, error)
- type IntCol
- type StringCol
- type TypeColumn
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Concat ¶
Concat concatenates pandas objects along a particular axis.
This function mirrors the behavior of pandas.concat for ease of switching from Python to Go for data scientists.
Parameters:
- objs: A slice of DataFrame pointers to concatenate. Nil DataFrames are skipped.
- opts: Optional ConcatOptions. If not provided, defaults are used.
Returns:
- A new DataFrame containing the concatenated data, or an error if the operation fails.
Example:
df1 := &dataframe.DataFrame{Columns: map[string]Series{"A": ...}, ColumnOrder: []string{"A"}}
df2 := &dataframe.DataFrame{Columns: map[string]Series{"A": ...}, ColumnOrder: []string{"A"}}
result, err := gpandas.Concat([]*dataframe.DataFrame{df1, df2})
// result contains all rows from df1 followed by rows from df2
// With options:
result, err := gpandas.Concat([]*dataframe.DataFrame{df1, df2}, gpandas.ConcatOptions{Axis: gpandas.AxisColumns, Join: gpandas.JoinInner})
func FloatColumn ¶
func NewDataFrameFromSeries ¶
func NewDataFrameFromSeries(columns map[string]collection.Series, columnOrder []string) (*dataframe.DataFrame, error)
NewDataFrameFromSeries creates a DataFrame from a map of Series.
Parameters:
columns: A map of column names to Series columnOrder: Optional slice specifying column order (uses map order if nil)
Returns:
A pointer to a DataFrame, or an error if validation fails
func NewEmptyDataFrame ¶
NewEmptyDataFrame creates an empty DataFrame with specified column names and types.
Parameters:
columns: A slice of column names columnTypes: A map of column names to their types (uses reflect.Type)
Returns:
A pointer to an empty DataFrame with the specified structure
Types ¶
type ConcatAxis ¶
type ConcatAxis int
ConcatAxis specifies the axis along which to concatenate.
const ( // AxisIndex (0) concatenates along rows (stacking DataFrames vertically). AxisIndex ConcatAxis = 0 // AxisColumns (1) concatenates along columns (joining DataFrames horizontally). AxisColumns ConcatAxis = 1 )
type ConcatJoin ¶
type ConcatJoin string
ConcatJoin specifies how to handle indexes on the non-concatenation axis.
const ( // JoinOuter takes the union of indexes (all columns/rows, with nulls for missing). JoinOuter ConcatJoin = "outer" // JoinInner takes the intersection of indexes (only common columns/rows). JoinInner ConcatJoin = "inner" )
type ConcatOptions ¶
type ConcatOptions struct {
// Axis is the axis to concatenate along. Default: AxisIndex (0).
Axis ConcatAxis
// Join determines how to handle indexes on other axis. Default: JoinOuter.
Join ConcatJoin
// IgnoreIndex if true, do not use the index values along the concatenation axis.
// The resulting axis will be labeled 0, 1, ..., n-1. Default: false.
IgnoreIndex bool
// VerifyIntegrity if true, check whether the new concatenated axis contains duplicates.
// This can be expensive. Default: false.
VerifyIntegrity bool
// Sort if true, sort non-concatenation axis if it is not already aligned. Default: false.
Sort bool
}
ConcatOptions configures the behavior of the Concat function.
func DefaultConcatOptions ¶
func DefaultConcatOptions() ConcatOptions
DefaultConcatOptions returns the default options for Concat.
type DbConfig ¶
type DbConfig struct {
Database_server string
Server string
Port string
Database string
Username string
Password string
}
struct to store db config.
NOTE: Prefer using env vars instead of hardcoding values
type GoPandas ¶
type GoPandas struct{}
func (GoPandas) DataFrame ¶
func (GoPandas) DataFrame(columns []string, data []Column, columns_types map[string]any) (*dataframe.DataFrame, error)
DataFrame creates a new DataFrame from the provided columns, data, and column types.
It validates the input parameters to ensure data consistency and proper type definitions.
The function performs several validation checks: - Ensures column_types map is provided - Verifies at least one column name is present - Checks that data is not empty - Confirms the number of columns matches the data columns - Validates all columns have the same length - Ensures type definitions exist for all columns
The data is then converted to the internal DataFrame format, creating typed Series based on the specified column types (FloatCol, IntCol, StringCol, BoolCol). Null values (nil) are properly tracked using the boolean mask approach.
Parameters:
columns: A slice of strings representing column names data: A slice of Columns containing the actual data columns_types: A map defining the expected type for each column
Returns:
A pointer to a DataFrame containing the processed data, or an error if validation fails
func (GoPandas) From_gbq ¶
QueryBigQuery executes a BigQuery SQL query and returns the results as a DataFrame.
Parameters:
query: The BigQuery SQL query string to execute. projectID: The Google Cloud Project ID where the BigQuery dataset resides.
Returns:
- A pointer to a DataFrame containing the query results.
- An error if the query execution fails or if there are issues with the BigQuery client.
The DataFrame's structure will match the query results:
- Columns will be named according to the SELECT statement
- Data types will be converted from BigQuery types to Go types
- NULL values are properly tracked using the boolean mask approach
Examples:
gp := gpandas.GoPandas{}
query := `SELECT name, age, city
FROM dataset.users
WHERE age > 25`
df, err := gp.QueryBigQuery(query, "my-project-id")
// Result DataFrame:
// name | age | city
// Alice | 30 | New York
// Bob | 35 | Chicago
// Charlie | 28 | Boston
Note: Requires appropriate Google Cloud credentials to be configured in the environment.
func (GoPandas) Read_csv ¶
Read_csv reads a CSV file from the specified filepath and converts it into a DataFrame.
It opens the CSV file, reads the header to determine the column names, and then reads all the records.
The function checks for errors during file operations and ensures that the CSV file is not empty.
It initializes data columns based on the number of headers and populates them with the corresponding values from the records.
If the number of columns in any row is inconsistent with the header, that row is skipped.
All values are stored as strings in StringSeries with proper null handling.
Parameters:
filepath: A string representing the path to the CSV file to be read.
Returns:
A pointer to a DataFrame containing the data from the CSV file, or an error if the operation fails.
func (GoPandas) Read_csv_typed ¶
func (gp GoPandas) Read_csv_typed(filepath string, columnTypes map[string]any) (*dataframe.DataFrame, error)
Read_csv_typed reads a CSV file and creates typed Series based on the provided column types.
This is similar to Read_csv but allows specifying column types for automatic type conversion. Empty strings in the CSV are treated as null values for non-string types.
Parameters:
filepath: A string representing the path to the CSV file to be read. columnTypes: A map defining the expected type for each column (FloatCol, IntCol, StringCol, BoolCol)
Returns:
A pointer to a DataFrame containing the typed data from the CSV file, or an error if the operation fails.
func (GoPandas) Read_sql ¶
Read_sql executes a SQL query against a database and returns the results as a DataFrame.
Parameters:
query: The SQL query string to execute.
db_config: A DbConfig struct containing database connection parameters:
- database_server: Type of database ("sqlserver" or other)
- server: Database server hostname or IP
- port: Database server port
- database: Database name
- username: Database user
- password: Database password
Returns:
- A pointer to a DataFrame containing the query results.
- An error if the database connection, query execution, or data processing fails.
The DataFrame's structure will match the query results:
- Columns will be named according to the SELECT statement
- Data types will be inferred from the database types
- NULL values are properly tracked using the boolean mask approach
Examples:
gp := gpandas.GoPandas{}
config := DbConfig{
database_server: "sqlserver",
server: "localhost",
port: "1433",
database: "mydb",
username: "user",
password: "pass",
}
query := `SELECT employee_id, name, department
FROM employees
WHERE department = 'Sales'`
df, err := gp.Read_sql(query, config)
// Result DataFrame:
// employee_id | name | department
// 1 | John | Sales
// 2 | Alice | Sales
// 3 | Bob | Sales
type TypeColumn ¶
type TypeColumn[T comparable] []T
TypeColumn represents a slice of a comparable type T.