bigqueryemulator

package module
v0.4.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 13, 2026 License: MIT Imports: 0 Imported by: 0

README

BigQuery Emulator

build and test GoDoc

BigQuery emulator server implemented in Go.
BigQuery emulator provides a way to launch a BigQuery server on your local machine for testing and development.

Note: this is a fork of goccy/bigquery-emulator. The SQL analyzer has been swapped from the CGO-based go-zetasql / go-zetasqlite stack to the pure-Go zetasql-wasm (ZetaSQL compiled to WebAssembly, executed via wazero). The vendored internal/zetasqlite layer replaces the external go-zetasqlite dependency. Huge thanks to @goccy for the original work this fork builds on.

Practical differences:

  • go install requires no CGO toolchain (no clang++, no CGO_ENABLED=1, no multi-minute ZetaSQL build).
  • Cross-compilation works out of the box.
  • Some runtime corners are still being filled in — see the test status of internal/zetasqlite/ for the current gap list.

Features

  • If you can choose the Go language as BigQuery client, you can launch a BigQuery emulator on the same process as the testing process by httptest .
  • BigQuery emulator can be built as a static single binary and can be launched as a standalone process. So, you can use the BigQuery emulator from programs written in non-Go languages or such as the bq command, by specifying the address of the launched BigQuery emulator.
  • BigQuery emulator utilizes SQLite for storage. You can select either memory or file as the data storage destination at startup, and if you set it to file, data can be persisted.
  • You can load seeds from a YAML file on startup

Status

Although this project is still in beta version, many features are already available.

BigQuery API

We've been implemented all the BigQuery APIs except the API to manipulate IAM resources. It is possible that some options are not supported, in which case please report them in an Issue.

Google Cloud Storage linkage

BigQuery emulator supports loading data from Google Cloud Storage and extracting table data. Currently, only CSV and JSON data types can be used for extracting. If you use Google Cloud Storage emulator, please set STORAGE_EMULATOR_HOST environment variable.

BigQuery Storage API

Supports gRPC-based read/write using BigQuery Storage API. Supports both Apache Avro and Arrow formats.

Google Standard SQL

BigQuery emulator supports many of the specifications present in Google Standard SQL. For example, it has the following features.

  • 200+ standard functions
  • Wildcard table
  • Templated Argument Function
  • JavaScript UDF

The supported feature set tracks the bundled internal/zetasqlite layer, which is built on top of zetasql-wasm.

Goals and Sponsors

The goal of this project is to build a server that behaves exactly like BigQuery from the BigQuery client's perspective. To do so, we need to support all features present in BigQuery ( Model API / Connection API / INFORMATION SCHEMA etc.. ) in addition to evaluating Google Standard SQL.

However, this project is a personal project and I develop it on my days off and after work. I work full time and maintain a lot of OSS. Therefore, the time available for this project is also limited. Of course, I will be adding features and fixing bugs on a regular basis to get us closer to our goals, but if you want me to implement the features you want, please consider sponsoring me. Of course, you can use this project for free, but if you sponsor me, that will be my motivation. Especially if you are part of a commercial company and could use this project, I'd be glad if you could consider sponsoring me at the same time.

Install

If Go is installed, you can install the latest version with the following command

$ go install github.com/glassmonkey/bigquery-emulator/cmd/bigquery-emulator@latest

The BigQuery emulator embeds the SQL analyzer through zetasql-wasm, so the install is a pure-Go go install — no CGO toolchain or ZetaSQL build is required.

Run via Docker

Container images are published to GitHub Container Registry, so the emulator can be brought up without building anything locally.

$ docker run --rm -p 9050:9050 -p 9060:9060 \
    ghcr.io/glassmonkey/bigquery-emulator:v0.1 \
    --project=test

Mount a YAML seed file into the container to preload datasets and tables on startup:

$ docker run --rm -p 9050:9050 -p 9060:9060 \
    -v $(pwd)/data.yaml:/data.yaml \
    ghcr.io/glassmonkey/bigquery-emulator:v0.1 \
    --project=test --data-from-yaml=/data.yaml

Multi-arch images are published for linux/amd64 and linux/arm64. Tag aliases:

  • :v<major>.<minor> (e.g. :v0.1) — tracks the latest patch within a minor series.
  • :v<major>.<minor>.<patch> (e.g. :v0.1.6) — pinned to an exact release.

How to start the standalone server

If you can install the bigquery-emulator CLI, you can start the server using the following options.

$ ./bigquery-emulator -h
Usage:
  bigquery-emulator [OPTIONS]

Application Options:
      --project=        specify the project name
      --dataset=        specify the dataset name
      --port=           specify the http port number. this port used by bigquery api (default: 9050)
      --grpc-port=      specify the grpc port number. this port used by bigquery storage api (default: 9060)
      --log-level=      specify the log level (debug/info/warn/error) (default: error)
      --log-format=     specify the log format (console/json) (default: console)
      --database=       specify the database file if required. if not specified, it will be on memory
      --data-from-yaml= specify the path to the YAML file that contains the initial data
  -v, --version         print version

Help Options:
  -h, --help            Show this help message

Start the server by specifying the project name

$ ./bigquery-emulator --project=test
[bigquery-emulator] REST server listening at 0.0.0.0:9050
[bigquery-emulator] gRPC server listening at 0.0.0.0:9060

How to use from bq client

1. Start the standalone server
$ ./bigquery-emulator --project=test --data-from-yaml=./server/testdata/data.yaml
[bigquery-emulator] REST server listening at 0.0.0.0:9050
[bigquery-emulator] gRPC server listening at 0.0.0.0:9060
  • server/testdata/data.yaml is here
2. Call endpoint from bq client
$ bq --api http://0.0.0.0:9050 query --project_id=test "SELECT * FROM dataset1.table_a WHERE id = 1"

+----+-------+---------------------------------------------+------------+----------+---------------------+
| id | name  |                  structarr                  |  birthday  | skillNum |     created_at      |
+----+-------+---------------------------------------------+------------+----------+---------------------+
|  1 | alice | [{"key":"profile","value":"{\"age\": 10}"}] | 2012-01-01 |        3 | 2022-01-01 12:00:00 |
+----+-------+---------------------------------------------+------------+----------+---------------------+

How to use from python client

A working compose-based example that pins the v0.1 image and runs example.py end-to-end lives at _examples/python. The snippet below is the same client-side wiring extracted for inline reference.

1. Start the standalone server
$ ./bigquery-emulator --project=test --dataset=dataset1
[bigquery-emulator] REST server listening at 0.0.0.0:9050
[bigquery-emulator] gRPC server listening at 0.0.0.0:9060
2. Call endpoint from python client

Create ClientOptions with api_endpoint option and use AnonymousCredentials to disable authentication.

from google.api_core.client_options import ClientOptions
from google.auth.credentials import AnonymousCredentials
from google.cloud import bigquery
from google.cloud.bigquery import QueryJobConfig

client_options = ClientOptions(api_endpoint="http://0.0.0.0:9050")
client = bigquery.Client(
  "test",
  client_options=client_options,
  credentials=AnonymousCredentials(),
)
client.query(query="...", job_config=QueryJobConfig())

If you use a DataFrame as the download destination for the query results, You must either disable the BigQueryStorage client with create_bqstorage_client=False or create a BigQueryStorage client that references the local grpc port (default 9060).

https://cloud.google.com/bigquery/docs/samples/bigquery-query-results-dataframe?hl=en

result = client.query(sql).to_dataframe(create_bqstorage_client=False)

or

from google.cloud import bigquery_storage

client_options = ClientOptions(api_endpoint="0.0.0.0:9060")
read_client = bigquery_storage.BigQueryReadClient(client_options=client_options)
result = client.query(sql).to_dataframe(bqstorage_client=read_client)

Synopsis

If you use the Go language as a BigQuery client, you can launch the BigQuery emulator on the same process as the testing process.
Please imports github.com/glassmonkey/bigquery-emulator/server ( and github.com/glassmonkey/bigquery-emulator/types ) and you can use server.New API to create the emulator server instance.

See the API reference for more information: https://pkg.go.dev/github.com/glassmonkey/bigquery-emulator

package main

import (
  "context"
  "fmt"

  "cloud.google.com/go/bigquery"
  "github.com/glassmonkey/bigquery-emulator/server"
  "github.com/glassmonkey/bigquery-emulator/types"
  "google.golang.org/api/iterator"
  "google.golang.org/api/option"
)

func main() {
  ctx := context.Background()
  const (
    projectID = "test"
    datasetID = "dataset1"
    routineID = "routine1"
  )
  bqServer, err := server.New(server.TempStorage)
  if err != nil {
    panic(err)
  }
  if err := bqServer.Load(
    server.StructSource(
      types.NewProject(
        projectID,
        types.NewDataset(
          datasetID,
        ),
      ),
    ),
  ); err != nil {
    panic(err)
  }
  if err := bqServer.SetProject(projectID); err != nil {
    panic(err)
  }
  testServer := bqServer.TestServer()
  defer testServer.Close()

  client, err := bigquery.NewClient(
    ctx,
    projectID,
    option.WithEndpoint(testServer.URL),
    option.WithoutAuthentication(),
  )
  if err != nil {
    panic(err)
  }
  defer client.Close()
  routineName, err := client.Dataset(datasetID).Routine(routineID).Identifier(bigquery.StandardSQLID)
  if err != nil {
    panic(err)
  }
  sql := fmt.Sprintf(`
CREATE FUNCTION %s(
  arr ARRAY<STRUCT<name STRING, val INT64>>
) AS (
  (SELECT SUM(IF(elem.name = "foo",elem.val,null)) FROM UNNEST(arr) AS elem)
)`, routineName)
  job, err := client.Query(sql).Run(ctx)
  if err != nil {
    panic(err)
  }
  status, err := job.Wait(ctx)
  if err != nil {
    panic(err)
  }
  if err := status.Err(); err != nil {
    panic(err)
  }

  it, err := client.Query(fmt.Sprintf(`
SELECT %s([
  STRUCT<name STRING, val INT64>("foo", 10),
  STRUCT<name STRING, val INT64>("bar", 40),
  STRUCT<name STRING, val INT64>("foo", 20)
])`, routineName)).Read(ctx)
  if err != nil {
    panic(err)
  }

  var row []bigquery.Value
  if err := it.Next(&row); err != nil {
    if err == iterator.Done {
        return
    }
    panic(err)
  }
  fmt.Println(row[0]) // 30
}

How it works

BigQuery Emulator Architecture Overview

After receiving a ZetaSQL query via the REST API from bq or a client SDK, the bundled internal/zetasqlite layer (built on zetasql-wasm) parses and analyzes the query to produce an AST. The AST is lowered into a SQLite query, which is then executed through go-sqlite3 against the SQLite database.

Diagram credit: original by @goccy for the upstream go-zetasqlite-based architecture. The boxes labelled "go-zetasqlite" / "go-zetasql" map onto internal/zetasqlite and zetasql-wasm in this fork; the surrounding data flow is unchanged.

Type Conversion Flow

BigQuery has a number of types that do not exist in SQLite (e.g. ARRAY and STRUCT). In order to handle them in SQLite, internal/zetasqlite encodes every type except INT64 / FLOAT64 / BOOL as a (type info, data) pair and stores the encoded blob in SQLite. When the encoded data is read back, a custom function registered with go-sqlite3 decodes it before use.

Diagram credit: original by @goccy; the encoding strategy is unchanged in this fork.

Observability

The emulator can export OpenTelemetry traces over OTLP/gRPC. Tracing is off by default — the SDK only initialises when an endpoint is supplied, so unconfigured builds carry no tracing overhead.

Enable from the CLI

bigquery-emulator --project=test --otel-endpoint=otel-collector:4317
# or via env
BIGQUERY_EMULATOR_OTEL_ENDPOINT=otel-collector:4317 bigquery-emulator --project=test

--otel-endpoint="" (empty, the default) leaves the no-op tracer in place.

Enable from a library embedder

srv, _ := server.New(server.MemoryStorage)
if err := srv.SetOTel(ctx, "otel-collector:4317"); err != nil {
    log.Fatal(err)
}
// SetOTel is idempotent and accepts "" to revert to no-op.

Bundled collector (docker compose)

A pre-wired OpenTelemetry Collector with the fileexporter is shipped behind the otel compose profile. It is off in the default make docker/up flow.

BIGQUERY_EMULATOR_OTEL_ENDPOINT=otel-collector:4317 \
  docker compose -f e2e/compose.yml --profile otel up -d

Spans land in e2e/otel-output/traces.jsonl (one OTLP-JSON resourceSpans document per line) via the bind mount declared on the collector service. The directory is tracked via .gitkeep; the file itself is gitignored.

What is instrumented

  • tracingMiddleware wraps every HTTP request with an otelhttp server span (parent), then injects the server's tracer into ctx so handlers can pull a tracer through tracing.FromContext(ctx) and open child spans without depending on a global TracerProvider.
  • sequentialAccessMiddleware records its lock-wait time on the request span as bqemu.mutex.wait_ms. The whole emulator is gated behind one mutex, so this attribute is the single biggest knob for explaining tail latency under concurrent load.
  • The hot-path job handlers (server.jobs.insert, server.jobs.get, server.jobs.getQueryResults, server.jobs.query, server.jobs.list) and the cross-cutting lookups in withProjectMiddleware / withJobMiddleware open named child spans with bqemu.project, bqemu.job_id, etc. attached.
  • Custom spans from caller code: tracing.Start(ctx, "your.span") returns the new ctx and an EndFunc(*error) that records the error and closes the span. defer end(&err) together with a named-return err is the intended use.

A real-world example of using these spans to attribute a regression lives at #90.

Reference

Regarding the story of bigquery-emulator, there are the following articles.

License

MIT

Documentation

Index

Constants

View Source
const Version = "0.4.2"

Variables

This section is empty.

Functions

This section is empty.

Types

This section is empty.

Directories

Path Synopsis
cmd
e2e
cmd/mergefixture command
mergefixture combines a per-caseset fixture directory into a single emulator-compatible YAML that bigquery-emulator loads via --data-from-yaml.
mergefixture combines a per-caseset fixture directory into a single emulator-compatible YAML that bigquery-emulator loads via --data-from-yaml.
internal
tracing
Package tracing wraps an OpenTelemetry tracer in the same context-injected style as internal/logger so handler and library code can pull a tracer out of ctx and Start spans without having to know whether tracing is configured.
Package tracing wraps an OpenTelemetry tracer in the same context-injected style as internal/logger so handler and library code can pull a tracer out of ctx and Start spans without having to know whether tracing is configured.
Code generated by internal/cmd/generator.
Code generated by internal/cmd/generator.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL