bqschemaupdater is a tool for adding and updating BigQuery table schema.
schemas should be written in .proto format.
Bqschemaupdater uses protoc found in $PATH. Please make the latest protobuf library available in $PATH.
More information can be had with:
The operations supported by this tool include:
- Creating a new table
- Adding NULLABLE or REPEATED columns to an existing table
- Making REQUIRED fields NULLABLE in an existing table
Table IDs and Dataset IDs should be underscored delimited, e.g.
Columns in BigQuery tables cannot be modified or deleted once they have been created. However, it is easy to add columns, so err on the side of not adding a column if you are unsure.
BigQuery provides useful types such as RECORD and TIMESTAMP. It is a good idea to take advantage of these types.
Events usually have an associated timestamp field recording the time the event took place.
BigQuery discourages JOINs and encourages denormalizing data.
Flattening repeated fields can be costly. If you know that a repeated field will only ever contain 0 or 1 value, consider not making that field repeated.
Please see BigQuery docs for the most updated limits for creating and modifying tables. It is not expected that we will exceed these limits through bqschemaupdater usage. If you are planning a project which might, please contact the Monitoring Team.
Command bqschemaupdater accepts location and schema of a BigQuery table and creates or updates the table.
When converting a proto message to BigQuery schema, in the order of precedence:
- one message field becomes at most one BigQuery field - if a field has leading comments, common indentation is trimmed and the result becomes the BigQuery field description - if a field is of enum type, the BigQuery type is string and valid values are appended to the BigQuery field description - if a field is google.protobuf.Duration, the BigQuery type is FLOAT64 - if a field is google.protobuf.Timestamp, the BigQuery type is TIMESTAMP - if a field is google.protobuf.Struct, is is persisted as a JSONPB string. - if a field is of message type, the BigQuery type is RECORD with schema corresponding to the proto field type, recursively. However, if the resulting RECORD schema is empty, the field is omitted.