README

bqschemaupdater is a tool for adding and updating BigQuery table schema.

Usage

schemas should be written in .proto format.

Bqschemaupdater uses protoc found in $PATH. Please make the latest protobuf library available in $PATH.

More information can be had with:

bqschemaupdater --help

Supported Uses

The operations supported by this tool include:

  • Creating a new table
  • Adding NULLABLE or REPEATED columns to an existing table
  • Making REQUIRED fields NULLABLE in an existing table

Standard Practices

Table IDs and Dataset IDs should be underscored delimited, e.g. test_results.

Schema Definitions

Columns in BigQuery tables cannot be modified or deleted once they have been created. However, it is easy to add columns, so err on the side of not adding a column if you are unsure.

BigQuery provides useful types such as RECORD and TIMESTAMP. It is a good idea to take advantage of these types.

Events usually have an associated timestamp field recording the time the event took place.

BigQuery discourages JOINs and encourages denormalizing data.

Flattening repeated fields can be costly. If you know that a repeated field will only ever contain 0 or 1 value, consider not making that field repeated.

BigQuery Limits

Please see BigQuery docs for the most updated limits for creating and modifying tables. It is not expected that we will exceed these limits through bqschemaupdater usage. If you are planning a project which might, please contact the Monitoring Team.

Documentation

Overview

    Command bqschemaupdater accepts location and schema of a BigQuery table and creates or updates the table.

    When converting a proto message to BigQuery schema, in the order of precedence:

    - one message field becomes at most one BigQuery field
    - if a field has leading comments, common indentation is trimmed
      and the result becomes the BigQuery field description
    - if a field is of enum type, the BigQuery type is string
      and valid values are appended to the BigQuery field description
    - if a field is google.protobuf.Duration, the BigQuery type is FLOAT64
    - if a field is google.protobuf.Timestamp, the BigQuery type is TIMESTAMP
    - if a field is google.protobuf.Struct, is is persisted as a JSONPB string.
    - if a field is of message type, the BigQuery type is RECORD
      with schema corresponding to the proto field type, recursively.
      However, if the resulting RECORD schema is empty, the field is omitted.