sneller

package module
v0.0.0-...-86e9f11 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 7, 2024 License: Apache-2.0 Imports: 28 Imported by: 0

README

Become a test partner

Please reach out to frank@sneller.io if you are interested in becoming a test partner of our serverless cloud offering.

SQL for JSON at scale: fast, simple, schemaless

Sneller is a high-performance SQL engine built to analyze petabyte-scale un-structured logs and other event data.

Here are a couple major differentiators between Sneller and other SQL solutions:

Sneller Cloud gives you access to a hosted version of the Sneller SQL engine that runs directly on data stored entirely in your S3 buckets. Our cloud platform offers excellent performance and is priced at an extremely competitive $150 per petabyte of data scanned.

Browser Demo

You can run queries for free against Sneller Cloud from your browser through our playground. We've created a public table containing about 1 billion rows from the GitHub archive data set. Additionally, you can create new ephemeral tables by uploading your own JSON data (but please don't upload anything sensitive!)

The Sneller playground is also usable directly with a local http client like curl:

asciicast

Local Demo

asciicast

If you have go installed on a machine with AVX512, you can build tables from JSON files and run the query engine locally:

$ grep -q avx512 /proc/cpuinfo && echo "yes, I have AVX512"
yes, I have AVX512
$ # install the sdb tool (make sure $GOBIN is in your $PATH)
$ go install github.com/SnellerInc/sneller/cmd/sdb@latest
$ # pack a JSON object into a table that can be queried;
$ # here we're using some github archive JSON:
$ wget https://data.gharchive.org/2015-01-01-15.json.gz
$ sdb pack -o github.zion 2015-01-01-15.json.gz
$ # run a query, using JSON as the output format:
$ sdb query -v -fmt=json "select count(*), type from read_file('github.zion') group by type"
{"type": "CreateEvent", "count": 1471}
{"type": "PushEvent", "count": 5815}
{"type": "WatchEvent", "count": 1230}
{"type": "ReleaseEvent", "count": 60}
{"type": "PullRequestEvent", "count": 474}
{"type": "IssuesEvent", "count": 545}
{"type": "ForkEvent", "count": 355}
{"type": "GollumEvent", "count": 61}
{"type": "IssueCommentEvent", "count": 844}
{"type": "DeleteEvent", "count": 260}
{"type": "PullRequestReviewCommentEvent", "count": 136}
{"type": "CommitCommentEvent", "count": 73}
{"type": "MemberEvent", "count": 25}
{"type": "PublicEvent", "count": 2}
18874368 bytes (18.000 MiB) scanned in 1.475857ms 12.5GiB/s

See our SQL reference for more information on the Sneller SQL dialect.

If you don't have access to a physical machine with AVX512 support, we recommend renting a VM from one of the major cloud providers with one of these instance families:

  • AWS: c6i, m6i, r6i
  • GCP: N2, M2, C2, C3
  • Azure: Dv4, Ev4

Sneller Cloud

Our cloud platform simplifies the Sneller SQL user experience by giving you instant access to thousands of CPU cores to run your queries. Sneller Cloud also provides automatic synchronization between your source data and your SQL tables, so you don't have any batch processes to manage in order to keep your tables up-to-date. Our cloud solution has a simple usage-based pricing model that depends entirely on the amount of data your queries scan. (Since Sneller Cloud doesn't store any of your data, there are no additional storage charges.)

Performance

Sneller is generally able to provide end-to-end scanning performance in excess of 1GB/s/core on high-core-count machines. The core SQL engine is typically able to saturate the memory bandwidth of the machine; generally about half of the query execution time is spent decompressing the source data, and the other half is spent in the SQL engine itself. Scanning performance scales linearly with the number of CPU cores available, so for example a 1000-CPU cluster would generally provide scanning performance in excess of 1TB/s.

The zion compression format that the SQL engine consumes is "bucketized" so that queries that don't touch all of the fields in the source data consume fewer cycles during decompression. Concretely, the top-level fields in each record are hashed into one of 16 buckets, and each of these buckets is compressed separately. The query planner determines which fields are referenced by each query, and at execution time only the buckets that contain fields necessary to compute the final query result are actually decompressed. (Strictly columnar formats like Parquet stripe data into one bucket per column, with the restriction that the columns and their types are known in advance. Since Sneller operates on un-structured data, our solution needed to be completely agnostic to the structure of the data itself.)

License

Sneller is released under the Apache 2.0 license. See the LICENSE file for more information.

Documentation

Overview

Package sneller is the root package for the open source part of Sneller. This package contains core functions and data types shared by the sneller and snellerd executables.

Index

Constants

This section is empty.

Variables

View Source
var CacheLimit = memTotal / 2

CacheLimit defines a limit such that blob segments will not be cached if the total scan size of a request in bytes exceeds the limit.

View Source
var CanVMOpen = false

Functions

func BuildInfo

func BuildInfo() (*debug.BuildInfo, bool)

BuildInfo returns the build info data of binary.

func Version

func Version() (string, bool)

Version returns the version of binary, based on BuildInfo data.

Types

type CachedEnv

type CachedEnv interface {
	plan.Env
	CacheValues() ([]byte, time.Time)
}

type FSEnv

type FSEnv struct {
	Root db.InputFS
	// contains filtered or unexported fields
}

FSEnv provides a plan.Env from a db.FS

func Environ

func Environ(t db.Tenant, dbname string) (*FSEnv, error)

func (*FSEnv) CacheValues

func (f *FSEnv) CacheValues() ([]byte, time.Time)

CacheValues implements cachedEnv.CacheValues

func (*FSEnv) Index

func (f *FSEnv) Index(p expr.Node) (plan.Index, error)

func (*FSEnv) Key

func (f *FSEnv) Key() *blockfmt.Key

func (*FSEnv) ListTables

func (f *FSEnv) ListTables(dbname string) ([]string, error)

ListTables implements plan.TableLister.ListTables

func (*FSEnv) MaxScanned

func (f *FSEnv) MaxScanned() int64

MaxScanned returns the maximum number of bytes that need to be scanned to satisfy this query.

func (*FSEnv) Stat

func (f *FSEnv) Stat(e expr.Node, h *plan.Hints) (*plan.Input, error)

Stat implements plan.Env.Stat

func (*FSEnv) Uploader

func (f *FSEnv) Uploader() plan.UploadFS

type Splitter

type Splitter struct {
	WorkerID  tnproto.ID
	WorkerKey tnproto.Key
	Peers     []*net.TCPAddr
	SelfAddr  string
}

func (*Splitter) Geometry

func (s *Splitter) Geometry() *plan.Geometry

type TenantEnv

type TenantEnv struct {
	*FSEnv
}

TenantEnv implements plan.Decoder for use with snellerd in tenant mode. It also implements plan.Env, though must have the embedded FSEnv initialized in order to be used as such.

type TenantRunner

type TenantRunner struct {
	Events *os.File
	Cache  *dcache.Cache
}

func (*TenantRunner) Post

func (r *TenantRunner) Post()

func (*TenantRunner) Run

func (r *TenantRunner) Run(dst vm.QuerySink, in *plan.Input, ep *plan.ExecParams) error

Directories

Path Synopsis
Package auth describes some implementations of Provider that can be used in snellerd.
Package auth describes some implementations of Provider that can be used in snellerd.
aws
Package aws is a lightweight implementation of the AWS API signature algorithms.
Package aws is a lightweight implementation of the AWS API signature algorithms.
s3
Package s3 implements a lightweight client of the AWS S3 API.
Package s3 implements a lightweight client of the AWS S3 API.
Package cgroup implements a thin wrapper around the Linux cgroupv2 filesystem API.
Package cgroup implements a thin wrapper around the Linux cgroupv2 filesystem API.
cmd
sdb
Package compr provides a unified interface wrapping third-party compression libraries.
Package compr provides a unified interface wrapping third-party compression libraries.
Package date implements optimized date-parsing routines specific to the date formats that we support.
Package date implements optimized date-parsing routines specific to the date formats that we support.
Package db implements the policy layout of databases, tables, and indices as a virtual filesystem tree.
Package db implements the policy layout of databases, tables, and indices as a virtual filesystem tree.
Package debug provides remote debugging tools
Package debug provides remote debugging tools
elasticproxy module
Package expr implements the AST representation of query expressions.
Package expr implements the AST representation of query expressions.
partiql
Package partiql implements a SQL-compatible (and somewhat PartiQL-compatible) query text parser.
Package partiql implements a SQL-compatible (and somewhat PartiQL-compatible) query text parser.
Package fastdate implements low-level unix time stamp manipulation that follows the semantics of the Sneller SQL datetime functions.
Package fastdate implements low-level unix time stamp manipulation that follows the semantics of the Sneller SQL datetime functions.
Package fsutil defines functions and interfaces for working with file systems.
Package fsutil defines functions and interfaces for working with file systems.
Package fuzzy implements fuzzy equal/contains reference implementations.
Package fuzzy implements fuzzy equal/contains reference implementations.
Package heap implements generic heap functions.
Package heap implements generic heap functions.
internal
aes
Package aes provides access to the hardware AES encryption/decryption accelerator and supports basic key expansion functionality.
Package aes provides access to the hardware AES encryption/decryption accelerator and supports basic key expansion functionality.
asmutils
Package asmutils provides helpers for assembly integration
Package asmutils provides helpers for assembly integration
atomicext
Package atomicext provides extensions complementing the built-in atomic package
Package atomicext provides extensions complementing the built-in atomic package
memops
Package memops implements accelerated memory block manipulation primitives
Package memops implements accelerated memory block manipulation primitives
percentile
Package percentile provides a pure go implementation of tDigest aggregation and the computation of percentiles
Package percentile provides a pure go implementation of tDigest aggregation and the computation of percentiles
simd
Package simd provides selected intrinsics for the AVX512 SIMD extension emulation
Package simd provides selected intrinsics for the AVX512 SIMD extension emulation
stringext
Package stringext defines extra string functions.
Package stringext defines extra string functions.
Package ints provides int-related common functions.
Package ints provides int-related common functions.
ion
Package ion implements a subset of the Amazon ion binary format: https://amzn.github.io/ion-docs/
Package ion implements a subset of the Amazon ion binary format: https://amzn.github.io/ion-docs/
blockfmt
Package blockfmt implements routines for reading and writing compressed and aligned ion blocks to/from backing storage.
Package blockfmt implements routines for reading and writing compressed and aligned ion blocks to/from backing storage.
versify
Package versify implements an ion "versifier:" code that performs procedural data generation based on example input.
Package versify implements an ion "versifier:" code that performs procedural data generation based on example input.
zion
Package zion implements a "zipped" ion encoding that compresses streams of ion structures in a manner such that fields within structures in the stream can be decompressed without decompressing the entire input stream.
Package zion implements a "zipped" ion encoding that compresses streams of ion structures in a manner such that fields within structures in the stream can be decompressed without decompressing the entire input stream.
zion/iguana
Package iguana implements a Lizard-derived compression/decompression pipeline
Package iguana implements a Lizard-derived compression/decompression pipeline
zion/zll
Package zll exposes types and procedures related to low-level zion decoding.
Package zll exposes types and procedures related to low-level zion decoding.
Package jsonrl implements a Ragel-generated JSON parser that converts JSON data into ion data (see Convert).
Package jsonrl implements a Ragel-generated JSON parser that converts JSON data into ion data (see Convert).
Package plan is the primary interface to the query planner.
Package plan is the primary interface to the query planner.
pir
Package pir manages the low-level query plan intermediate representation.
Package pir manages the low-level query plan intermediate representation.
Package regexp2 implements regular expression engine.
Package regexp2 implements regular expression engine.
Package rules defines a syntax for rule-based re-writing DSLs.
Package rules defines a syntax for rule-based re-writing DSLs.
Package tenant encapsulates the logic and protocol-level details for managing tenant sub-processes.
Package tenant encapsulates the logic and protocol-level details for managing tenant sub-processes.
dcache
Package dcache provides a cache for table data by storing files in a directory.
Package dcache provides a cache for table data by storing files in a directory.
tnproto
Package tnproto defines the types and functions necessary to speak the tenant control protocol.
Package tnproto defines the types and functions necessary to speak the tenant control protocol.
Package testquery provides common functions used in query tests.
Package testquery provides common functions used in query tests.
Package tests provides common functions used in tests.
Package tests provides common functions used in tests.
Package usock implements a wrapper around the unix(7) SCM_RIGHTS API, which allows processes to exchange file handles over a unix(7) control socket.
Package usock implements a wrapper around the unix(7) SCM_RIGHTS API, which allows processes to exchange file handles over a unix(7) control socket.
Package utf8 provides additional UTF-8 related functions.
Package utf8 provides additional UTF-8 related functions.
vm
Package vm implements the core query-processing "physical operators" that process streams of ion-encoded data.
Package vm implements the core query-processing "physical operators" that process streams of ion-encoded data.
_generate
Alternative to-{upper,lower} approach --------------------------------------------------
Alternative to-{upper,lower} approach --------------------------------------------------
Package xsv implements parsing/converting CSV (RFC 4180) and TSV (tab separated values) files to binary ION format.
Package xsv implements parsing/converting CSV (RFC 4180) and TSV (tab separated values) files to binary ION format.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL