adbc-driver-spark

module

v0.2.0 Latest Latest Go to latest Published: Jun 15, 2026 License: Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/HyukjinKwon/adbc-driver-spark

Links

Open Source Insights

README ¶

ADBC Driver for Spark Connect

An Apache Arrow ADBC driver for Apache Spark Connect. It speaks the Spark Connect gRPC protocol and exposes it through the standard ADBC API, so you get Arrow-native result sets from Spark with zero copy into pandas, Polars, DuckDB, or any Arrow consumer.

Documentation: https://hyukjinkwon.github.io/adbc-driver-spark/

Why this driver

ADBC (Arrow Database Connectivity) is a vendor-neutral API for moving Arrow data in and out of databases, in the same spirit as JDBC and ODBC but columnar from end to end. Spark Connect already returns query results as Arrow IPC batches over gRPC, which makes it a natural fit for ADBC: there is no row-by-row conversion and no driver-side reshaping.

Arrow native, end to end. Results stream from Spark as Arrow batches and reach your application as Arrow record batches. No per-row boxing.
One driver, every language. The driver is built in Go and compiled to a C-ABI shared library that exposes the standard AdbcDriverInit entrypoint. It loads through the ADBC driver manager from C/C++, Python, R, Ruby, Rust, and Go.
Standard surface. Python users get a PEP 249 (DBAPI 2.0) interface and fetch_arrow_table() / fetch_df() helpers. C/C++ users get the plain ADBC C API. No bespoke client to learn.
Production focused. TLS and bearer-token auth, session and configuration options, metadata introspection (catalogs, schemas, tables, columns), prepared statements with parameter binding, and a CI matrix across Linux, macOS, and Windows.

Install

Python

pip install adbc-driver-spark

This pulls in the prebuilt shared library for your platform, plus adbc-driver-manager and pyarrow.

Go

go get github.com/HyukjinKwon/adbc-driver-spark

C, C++, R, Rust, Ruby, and other languages

These ecosystems all load the same C-ABI shared library through their existing ADBC driver manager, so there is no separate package to install for each. Get the shared library (libadbc_driver_spark.{so,dylib,dll}) from the Releases page (or build it from source, see Installation), then load it with the driver manager for your language:

Language	Driver manager	Guide
C / C++	`libadbc_driver_manager`	Using from C and C++
R	`adbcdrivermanager` (CRAN)	Using from R
Rust	`adbc_driver_manager` (crates.io)	Using from Rust
Ruby	`red-adbc` (RubyGems)	Using from Ruby

Point the driver manager's driver option at the shared library; it resolves the standard AdbcDriverInit entrypoint automatically.

Quickstart

Start a Spark Connect server (Spark 3.5.x, 4.0.x, or 4.1.x):

# From a Spark 4.x distribution (the Connect server is bundled)
./sbin/start-connect-server.sh
# Spark Connect listens on sc://localhost:15002 by default
# (On Spark 3.5.x, which does not bundle it, add:
#  --packages org.apache.spark:spark-connect_2.13:3.5.8)

Python

import adbc_driver_spark.dbapi as dbapi

with dbapi.connect("sc://localhost:15002") as conn:
    with conn.cursor() as cur:
        cur.execute("SELECT id, id * id AS square FROM range(5)")
        table = cur.fetch_arrow_table()   # pyarrow.Table
        print(table.to_pandas())

Go

package main

import (
	"context"
	"fmt"

	"github.com/apache/arrow-go/v18/arrow/memory"
	spark "github.com/HyukjinKwon/adbc-driver-spark/driver/spark"
)

func main() {
	drv := spark.NewDriver(memory.DefaultAllocator)
	db, _ := drv.NewDatabase(map[string]string{
		"uri": "sc://localhost:15002",
	})
	defer db.Close()

	cnxn, _ := db.Open(context.Background())
	defer cnxn.Close()

	stmt, _ := cnxn.NewStatement()
	defer stmt.Close()

	stmt.SetSqlQuery("SELECT id, id * id AS square FROM range(5)")
	reader, _, _ := stmt.ExecuteQuery(context.Background())
	defer reader.Release()

	for reader.Next() {
		fmt.Println(reader.Record())
	}
}

Runnable examples for Python, Go, C, R, Rust, and Ruby live in the examples directory, and the Python, C, R, Rust, and Ruby examples are run against a live Spark Connect server on every CI run. See the documentation for per-language guides.

Connecting and authentication

Connections use the standard Spark Connect connection string, passed as the ADBC uri option:

sc://host:port/;token=<jwt>;use_ssl=true;user_id=<id>;user_agent=<ua>

Common options:

Option	Meaning
`uri`	Spark Connect connection string (required)
`adbc.spark.token`	Bearer token for authentication
`adbc.spark.tls.enabled`	`true` or `false`
`adbc.spark.user_id`	Spark Connect user id
`adbc.spark.user_agent`	Custom user agent string
`adbc.spark.headers.<NAME>`	Extra gRPC metadata header

See the Configuration Reference for the full list.

Features

SQL execution returning Arrow record batches.
Prepared statements with Arrow parameter binding.
DML and DDL via ExecuteUpdate.
Metadata: GetObjects, GetTableSchema, GetTableTypes, GetInfo.
Full Spark to Arrow type mapping, including decimal, timestamp, timestamp_ntz, array, map, and struct. See Type Mapping.
TLS and bearer-token authentication.
Works against Spark Connect on Spark 3.5.x, 4.0.x, and 4.1.x, and Databricks Connect compatible endpoints. Every line is exercised in CI against a live server.

Compatibility

Component	Supported
Spark Connect	Spark 3.5.x, 4.0.x, 4.1.x (protos pinned to v4.1.2)
ADBC API	1.1.0
Python	3.9 - 3.13
Go	1.25+
Platforms	Linux (x86_64, aarch64), macOS (x86_64, arm64), Windows (x86_64)

See Compatibility and Conformance for the ADBC conformance matrix and known limitations.

Documentation

Full guides live at https://hyukjinkwon.github.io/adbc-driver-spark/.

Get started

Guides

Querying Data
Python DBAPI
Ecosystem Integrations (pandas, Polars, DuckDB, PyArrow)
Metadata and Catalogs
Type Mapping
Configuration Reference

Using from each language

Reference

Contributing

Contributions are welcome. See CONTRIBUTING.md for how to set up a development environment, run the tests, and submit changes. By participating you agree to the Code of Conduct.

Directories ¶

Path	Synopsis
driver
spark Package spark implements an Apache Arrow Database Connectivity (ADBC) driver for Apache Spark Connect.	Package spark implements an Apache Arrow Database Connectivity (ADBC) driver for Apache Spark Connect.
examples
go/metadata command Command metadata inspects catalog metadata through the ADBC connection API: GetObjects walks catalogs/schemas/tables/columns, GetTableSchema returns one table's Arrow schema, and GetTableTypes lists the table types Spark exposes.	Command metadata inspects catalog metadata through the ADBC connection API: GetObjects walks catalogs/schemas/tables/columns, GetTableSchema returns one table's Arrow schema, and GetTableTypes lists the table types Spark exposes.
go/parameters command Command parameters runs a prepared statement with bound positional parameters.	Command parameters runs a prepared statement with bound positional parameters.
go/quickstart command Command quickstart connects to a Spark Connect server with the native Go ADBC driver, runs a query, and streams the Apache Arrow results.	Command quickstart connects to a Spark Connect server with the native Go ADBC driver, runs a query, and streams the Apache Arrow results.
internal
sparkconnect Package sparkconnect implements a minimal, focused Spark Connect gRPC client tailored to the needs of the ADBC driver.	Package sparkconnect implements a minimal, focused Spark Connect gRPC client tailored to the needs of the ADBC driver.
sparkconnect/proto/spark/connect

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL