bullseye

package module
v0.0.0-...-77951e0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 7, 2020 License: Apache-2.0 Imports: 0 Imported by: 0

README

bullseye

NOTICE: THIS PROJECT IS DEPRECATED

This project has been merged into gomem. bullseye is now under the "dataframe" package in the gomem project.

.
.
.
.
.
.
.
.
.

GoDoc CircleCI

A DataFrame built on Apache Arrow.

Installation

Add the package to your go.mod file:

require github.com/go-bullseye/bullseye master

Or, clone the repository:

git clone --branch master https://github.com/go-bullseye/bullseye.git $GOPATH/src/github.com/go-bullseye/bullseye

A complete example:

mkdir my-dataframe-app && cd my-dataframe-app

cat > go.mod <<-END
  module my-dataframe-app

  require github.com/go-bullseye/bullseye master
END

cat > main.go <<-END
  package main

  import (
    "fmt"

    "github.com/apache/arrow/go/arrow/memory"
    "github.com/go-bullseye/bullseye/dataframe"
  )

  func main() {
    pool := memory.NewGoAllocator()
    df, _ := dataframe.NewDataFrameFromMem(pool, dataframe.Dict{
      "col1": []int32{1, 2, 3, 4, 5},
      "col2": []float64{1.1, 2.2, 3.3, 4.4, 5},
      "col3": []string{"foo", "bar", "ping", "", "pong"},
      "col4": []interface{}{2, 4, 6, nil, 8},
    })
    defer df.Release()
    fmt.Printf("DataFrame:\n%s\n", df.Display(0))
  }

  // DataFrame:
  // rec[0]["col1"]: [1 2 3 4 5]
  // rec[0]["col2"]: [1.1 2.2 3.3 4.4 5]
  // rec[0]["col3"]: ["foo" "bar" "ping" "" "pong"]
  // rec[0]["col4"]: [2 4 6 (null) 8]
END

go run main.go

Usage

See the DataFrame tests for extensive usage examples.

Reference Counting

From the arrow/go README...

The library makes use of reference counting so that it can track when memory buffers are no longer used. This allows Arrow to update resource accounting, pool memory such and track overall memory usage as objects are created and released. Types expose two methods to deal with this pattern. The Retain method will increase the reference count by 1 and Release method will reduce the count by 1. Once the reference count of an object is zero, any associated object will be freed. Retain and Release are safe to call from multiple goroutines.

When to call Retain / Release?
  • If you are passed an object and wish to take ownership of it, you must call Retain. You must later pair this with a call to Release when you no longer need the object. "Taking ownership" typically means you wish to access the object outside the scope of the current function call.

  • You own any object you create via functions whose name begins with New or Copy or any operation that results in a new immutable DataFrame being returned or when receiving an object over a channel. Therefore you must call Release once you no longer need the object.

  • If you send an object over a channel, you must call Retain before sending it as the receiver is assumed to own the object and will later call Release when it no longer needs the object.

Note: You can write a test using memory.NewCheckedAllocator to assert that you have released all resources properly. See: tests

TODO

This DataFrame currently implements most of the scalar types we've come across. There is still work to be done on some of the list and struct types. Feel free to submit a PR if find you need them. This library will let you know when you do.

  • Implement all Arrow DataTypes.
  • Add a filter function to DataFrame.
  • Add an order by function to DataFrame.

License

(c) 2019 Nick Poorman. Licensed under the Apache License, Version 2.0.

Documentation

Overview

Package bullseye provides an implementation of a DataFrame using Apache Arrow.

Basics

The DataFrame is an immutable heterogeneous tabular data structure with labeled columns. It stores it's raw bytes using a provided Arrow Allocator by using the fundamental data structure of Array (columns), which holds a sequence of values of the same type. An array consists of memory holding the data and an additional validity bitmap that indicates if the corresponding entry in the array is valid (not null).

Any DataFrames created should be released using Release() to decrement the reference and free up the memory managed by the Arrow implementation.

Getting Started

Look in the dataframe package to get started.

Directories

Path Synopsis
Package dataframe provides the DataFrame implementation.
Package dataframe provides the DataFrame implementation.
internal
cast
Package cast provides casting for sparse and dense arrays.
Package cast provides casting for sparse and dense arrays.
constructors
Package constructors provides constructors for arrow types.
Package constructors provides constructors for arrow types.
debug
Package debug provides compiled assertions, debug and warn level logging.
Package debug provides compiled assertions, debug and warn level logging.
Package iterator provides iterators for chunks and values.
Package iterator provides iterators for chunks and values.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL