bq

package
v0.0.0-...-51f9457 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 9, 2021 License: Apache-2.0 Imports: 31 Imported by: 0

Documentation

Overview

Package bq is a library for working with BigQuery.

Limits

Please see BigQuery docs: https://cloud.google.com/bigquery/quotas#streaminginserts for the most updated limits for streaming inserts. It is expected that the client is responsible for ensuring their usage will not exceed these limits through bq usage. A note on maximum rows per request: Put() batches rows per request, ensuring that no more than 10,000 rows are sent per request, and allowing for custom batch size. BigQuery recommends using 500 as a practical limit (so we use this as a default), and experimenting with your specific schema and data sizes to determine the batch size with the ideal balance of throughput and latency for your use case.

Authentication

Authentication for the Cloud projects happens during client creation: https://godoc.org/cloud.google.com/go#pkg-examples. What form this takes depends on the application.

Monitoring

You can use tsmon (https://godoc.org/go.chromium.org/luci/common/tsmon) to track upload latency and errors.

If Uploader.UploadsMetricName field is not zero, Uploader will create a counter metric to track successes and failures.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func AddMissingFields

func AddMissingFields(dest *bigquery.Schema, src bigquery.Schema)

AddMissingFields copies fields from src to dest if they are not present in dest.

func SchemaDiff

func SchemaDiff(before, after bigquery.Schema) string

SchemaDiff returns unified diff of two schemas. Returns "" if there is no difference.

func SchemaString

func SchemaString(s bigquery.Schema) string

SchemaString returns schema in string format.

Types

type InsertIDGenerator

type InsertIDGenerator struct {
	// Counter is an atomically-managed counter used to differentiate Insert
	// IDs produced by the same process.
	Counter int64
	// Prefix should be able to uniquely identify this specific process,
	// to differentiate Insert IDs produced by different processes.
	//
	// If empty, prefix will be derived from system and process specific
	// properties.
	Prefix string
}

InsertIDGenerator generates unique Insert IDs.

BigQuery uses Insert IDs to deduplicate rows in the streaming insert buffer. The association between Insert ID and row persists only for the time the row is in the buffer.

InsertIDGenerator is safe for concurrent use.

ID is the global InsertIDGenerator

func (*InsertIDGenerator) Generate

func (id *InsertIDGenerator) Generate() string

Generate returns a unique Insert ID.

type Row

type Row struct {
	proto.Message // embedded

	// InsertID is unique per insert operation to handle deduplication.
	InsertID string
}

Row implements bigquery.ValueSaver

func (*Row) Save

func (r *Row) Save() (map[string]bigquery.Value, string, error)

Save is used by bigquery.Uploader.Put when inserting values into a table.

type SchemaConverter

type SchemaConverter struct {
	Desc           *descriptorpb.FileDescriptorSet
	SourceCodeInfo map[*descriptorpb.FileDescriptorProto]SourceCodeInfoMap
}

func (*SchemaConverter) Schema

func (c *SchemaConverter) Schema(messageName string) (schema bigquery.Schema, description string, err error)

Schema constructs a bigquery.Schema from a named message.

type SourceCodeInfoMap

type SourceCodeInfoMap map[interface{}]*descriptorpb.SourceCodeInfo_Location

SourceCodeInfoMap maps descriptor proto messages to source code info, if available. See also descutil.IndexSourceCodeInfo.

type Uploader

type Uploader struct {
	*bigquery.Uploader
	// Uploader is bound to a specific table. DatasetID and Table ID are
	// provided for reference.
	DatasetID string
	TableID   string
	// UploadsMetricName is a string used to create a tsmon Counter metric
	// for event upload attempts via Put, e.g.
	// "/chrome/infra/commit_queue/events/count". If unset, no metric will
	// be created.
	UploadsMetricName string

	// BatchSize is the max number of rows to send to BigQuery at a time.
	// The default is 500.
	BatchSize int
	// contains filtered or unexported fields
}

Uploader contains the necessary data for streaming data to BigQuery.

func NewUploader

func NewUploader(ctx context.Context, c *bigquery.Client, datasetID, tableID string) *Uploader

NewUploader constructs a new Uploader struct.

DatasetID and TableID are provided to the BigQuery client to gain access to a particular table.

You may want to change the default configuration of the bigquery.Uploader. Check the documentation for more details.

Set UploadsMetricName on the resulting Uploader to use the default counter metric.

Set BatchSize to set a custom batch size.

func (*Uploader) Put

func (u *Uploader) Put(ctx context.Context, messages ...proto.Message) error

Put uploads one or more rows to the BigQuery service. Put takes care of adding InsertIDs, used by BigQuery to deduplicate rows.

If any rows do now match one of the expected types, Put will not attempt to upload any rows and returns an InvalidTypeError.

Put returns a PutMultiError if one or more rows failed to be uploaded. The PutMultiError contains a RowInsertionError for each failed row.

Put will retry on temporary errors. If the error persists, the call will run indefinitely. Because of this, if ctx does not have a timeout, Put will add one.

See bigquery documentation and source code for detailed information on how struct values are mapped to rows.

Directories

Path Synopsis
Package pb contains helper protobuf messages used to define BQ schemas.
Package pb contains helper protobuf messages used to define BQ schemas.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL