go

module
v17.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 11, 2024 License: Apache-2.0, BSD-2-Clause, BSD-3-Clause, + 7 more

README

Apache Arrow for Go

Go Reference

Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and inter-process communication.

A note about FlightSQL drivers

Go FlightSQL drivers live in the ADBC repository. In particular, to use the Golang database/sql interface:

import (
    "database/sql"
    _ "github.com/apache/arrow-adbc/go/adbc/sqldriver/flightsql"
)

func main() {
	dsn := "uri=grpc://localhost:12345;username=mickeymouse;password=p@55w0RD"
    db, err := sql.Open("flightsql", dsn)
    ...
}

DSN option keys are expressed as k=v, delimited with ;. Some options keys are defined in ADBC, others are defined in the FlightSQL ADBC driver.

Reference Counting

The library makes use of reference counting so that it can track when memory buffers are no longer used. This allows Arrow to update resource accounting, pool memory such and track overall memory usage as objects are created and released. Types expose two methods to deal with this pattern. The Retain method will increase the reference count by 1 and Release method will reduce the count by 1. Once the reference count of an object is zero, any associated object will be freed. Retain and Release are safe to call from multiple goroutines.

When to call Retain / Release?
  • If you are passed an object and wish to take ownership of it, you must call Retain. You must later pair this with a call to Release when you no longer need the object. "Taking ownership" typically means you wish to access the object outside the scope of the current function call.

  • You own any object you create via functions whose name begins with New or Copy or when receiving an object over a channel. Therefore you must call Release once you no longer need the object.

  • If you send an object over a channel, you must call Retain before sending it as the receiver is assumed to own the object and will later call Release when it no longer needs the object.

Performance

The arrow package makes extensive use of c2goasm to leverage LLVM's advanced optimizer and generate PLAN9 assembly functions from C/C++ code. The arrow package can be compiled without these optimizations using the noasm build tag. Alternatively, by configuring an environment variable, it is possible to dynamically configure which architecture optimizations are used at runtime. We use the (cpu)[https://pkg.go.dev/golang.org/x/sys/cpu] package to check dynamically for these features.

Example Usage

The following benchmarks demonstrate summing an array of 8192 values using various optimizations.

Disable no architecture optimizations (thus using AVX2):

$ INTEL_DISABLE_EXT=NONE go test -bench=8192 -run=. ./math
goos: darwin
goarch: amd64
pkg: github.com/apache/arrow/go/arrow/math
BenchmarkFloat64Funcs_Sum_8192-8   	 2000000	       687 ns/op	95375.41 MB/s
BenchmarkInt64Funcs_Sum_8192-8     	 2000000	       719 ns/op	91061.06 MB/s
BenchmarkUint64Funcs_Sum_8192-8    	 2000000	       691 ns/op	94797.29 MB/s
PASS
ok  	github.com/apache/arrow/go/arrow/math	6.444s

NOTE: NONE is simply ignored, thus enabling optimizations for AVX2 and SSE4


Disable AVX2 architecture optimizations:

$ INTEL_DISABLE_EXT=AVX2 go test -bench=8192 -run=. ./math
goos: darwin
goarch: amd64
pkg: github.com/apache/arrow/go/arrow/math
BenchmarkFloat64Funcs_Sum_8192-8   	 1000000	      1912 ns/op	34263.63 MB/s
BenchmarkInt64Funcs_Sum_8192-8     	 1000000	      1392 ns/op	47065.57 MB/s
BenchmarkUint64Funcs_Sum_8192-8    	 1000000	      1405 ns/op	46636.41 MB/s
PASS
ok  	github.com/apache/arrow/go/arrow/math	4.786s

Disable ALL architecture optimizations, thus using pure Go implementation:

$ INTEL_DISABLE_EXT=ALL go test -bench=8192 -run=. ./math
goos: darwin
goarch: amd64
pkg: github.com/apache/arrow/go/arrow/math
BenchmarkFloat64Funcs_Sum_8192-8   	  200000	     10285 ns/op	6371.41 MB/s
BenchmarkInt64Funcs_Sum_8192-8     	  500000	      3892 ns/op	16837.37 MB/s
BenchmarkUint64Funcs_Sum_8192-8    	  500000	      3929 ns/op	16680.00 MB/s
PASS
ok  	github.com/apache/arrow/go/arrow/math	6.179s

Directories

Path Synopsis
Package arrow provides an implementation of Apache Arrow.
Package arrow provides an implementation of Apache Arrow.
array
Package array provides implementations of various Arrow array types.
Package array provides implementations of various Arrow array types.
arrio
Package arrio exposes functions to manipulate records, exposing and using interfaces not unlike the ones defined in the stdlib io package.
Package arrio exposes functions to manipulate records, exposing and using interfaces not unlike the ones defined in the stdlib io package.
avro
Package avro reads Avro OCF files and presents the extracted data as records
Package avro reads Avro OCF files and presents the extracted data as records
compute
Package compute is a native-go implementation of an Acero-like arrow compute engine.
Package compute is a native-go implementation of an Acero-like arrow compute engine.
compute/internal/kernels
Package kernels defines all of the computation kernels for the compute library.
Package kernels defines all of the computation kernels for the compute library.
csv
Package csv reads CSV files and presents the extracted data as records, also writes data as record into CSV files
Package csv reads CSV files and presents the extracted data as records, also writes data as record into CSV files
flight/flightsql/driver
Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements.
Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements.
flight/flightsql/example
Package example contains a FlightSQL Server implementation using sqlite as the backing engine.
Package example contains a FlightSQL Server implementation using sqlite as the backing engine.
flight/flightsql/schema_ref
Package schema_ref contains the expected reference Schemas to be used by FlightSQL servers and clients.
Package schema_ref contains the expected reference Schemas to be used by FlightSQL servers and clients.
flight/session
Package session provides server middleware and reference implementations for Flight session management.
Package session provides server middleware and reference implementations for Flight session management.
internal/arrdata
Package arrdata exports arrays and records data ready to be used for tests.
Package arrdata exports arrays and records data ready to be used for tests.
internal/arrjson
Package arrjson provides types and functions to encode and decode ARROW types and data to and from JSON files.
Package arrjson provides types and functions to encode and decode ARROW types and data to and from JSON files.
internal/debug
Package debug provides APIs for conditional runtime assertions and debug logging.
Package debug provides APIs for conditional runtime assertions and debug logging.
internal/flight_integration/cmd/arrow-flight-integration-client
Client for use with Arrow Flight Integration tests via archery
Client for use with Arrow Flight Integration tests via archery
ipc
ipc/cmd/arrow-cat
Command arrow-cat displays the content of an Arrow stream or file.
Command arrow-cat displays the content of an Arrow stream or file.
ipc/cmd/arrow-ls
Command arrow-ls displays the listing of an Arrow file.
Command arrow-ls displays the listing of an Arrow file.
math
Package math provides optimized mathematical functions for processing Arrow arrays.
Package math provides optimized mathematical functions for processing Arrow arrays.
memory
Package memory provides support for allocating and manipulating memory at a low level.
Package memory provides support for allocating and manipulating memory at a low level.
memory/mallocator
Package mallocator defines an allocator implementation for memory.Allocator which defers to libc malloc.
Package mallocator defines an allocator implementation for memory.Allocator which defers to libc malloc.
tensor
Package tensor provides types that implement n-dimensional arrays.
Package tensor provides types that implement n-dimensional arrays.
internal
hashing
Package hashing provides utilities for and an implementation of a hash table which is more performant than the default go map implementation by leveraging xxh3 and some custom hash functions.
Package hashing provides utilities for and an implementation of a hash table which is more performant than the default go map implementation by leveraging xxh3 and some custom hash functions.
types
Package types contains user-defined types for use in the tests for the arrow package
Package types contains user-defined types for use in the tests for the arrow package
Package parquet provides an implementation of Apache Parquet for Go.
Package parquet provides an implementation of Apache Parquet for Go.
compress
Package compress contains the interfaces and implementations for handling compression/decompression of parquet data at the column levels.
Package compress contains the interfaces and implementations for handling compression/decompression of parquet data at the column levels.
internal/bmi
Package bmi contains helpers for manipulating bitmaps via BMI2 extensions properly falling back to pure go implementations if the CPU doesn't support BMI2.
Package bmi contains helpers for manipulating bitmaps via BMI2 extensions properly falling back to pure go implementations if the CPU doesn't support BMI2.
internal/debug
Package debug provides APIs for conditional runtime assertions and debug logging.
Package debug provides APIs for conditional runtime assertions and debug logging.
internal/encryption
Package encryption contains the internal helpers for the parquet AES encryption/decryption handling.
Package encryption contains the internal helpers for the parquet AES encryption/decryption handling.
internal/testutils
Package testutils contains utilities for generating random data and other helpers that are used for testing the various aspects of the parquet library.
Package testutils contains utilities for generating random data and other helpers that are used for testing the various aspects of the parquet library.
internal/thrift
Package thrift is just some useful helpers for interacting with thrift to make other code easier to read/write and centralize interactions.
Package thrift is just some useful helpers for interacting with thrift to make other code easier to read/write and centralize interactions.
internal/utils
Package utils contains various internal utilities for the parquet library that aren't intended to be exposed to external consumers such as interfaces and bitmap readers/writers including the RLE encoder/decoder and so on.
Package utils contains various internal utilities for the parquet library that aren't intended to be exposed to external consumers such as interfaces and bitmap readers/writers including the RLE encoder/decoder and so on.
pqarrow
Package pqarrow provides the implementation for connecting Arrow directly with the Parquet implementation, allowing isolation of all the explicitly arrow related code to this package which has the interfaces for reading and writing directly to and from arrow Arrays/Tables/Records
Package pqarrow provides the implementation for connecting Arrow directly with the Parquet implementation, allowing isolation of all the explicitly arrow related code to this package which has the interfaces for reading and writing directly to and from arrow Arrays/Tables/Records
schema
Package schema provides types and functions for manipulating and building parquet file schemas.
Package schema provides types and functions for manipulating and building parquet file schemas.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL