molt

command module
v0.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 8, 2023 License: Apache-2.0 Imports: 1 Imported by: 0

README

molt

Go Tests

Migrate Off Legacy Things is CockroachDB's suite of tools to assist migrations. This repo contains any open-source MOLT tooling.

Certain packages can be re-used by external tools and are subject to the Apache License.

Build

All commands require molt to be built. Example:

# Build molt for the local machine (goes to artifacts/molt)
go build -o artifacts/molt .

# Cross compiling.
GOOS=linux GOARCH=amd64 go build -v -o artifacts/molt .

Verification

molt verify does the following:

  • Verifies that tables between the two data sources are the same.
  • Verifies that table column definitions between the two data sources are the same.
  • Verifies that tables contain the row values between data sources.

It currently supports PostgreSQL and MySQL comparisons with CockroachDB. It takes in two connection strings as arguments: source and target. source is assumed to be the source of truth.

# Compare postgres and CRDB instance.
molt verify \
  --source 'postgres://user:pass@url:5432/db' \
  --target 'postgres://root@localhost:26257?sslmode=disable'

# Compare mysql and CRDB instance.
molt verify \
  --source 'jdbc:mysql://root@tcp(localhost:3306)/defaultdb' \
  --target 'postgresql://root@127.0.0.1:26257/defaultdb?sslmode=disable'

See molt verify --help for all available parameters.

Filters

To verify specific tables or schemas, use --table-filter or --schema-filter.

Continuous verification

If you want all tables to be verified in a loop, you can use --continuous.

Live verification

If you expect data to change as you do data verification, you can use --live. This makes verifier re-check rows before marking them as problematic.

Limitations
  • MySQL enums and set types are not supported.
  • Supports only comparing one MySQL database vs a whole CRDB schema (which is assumed to be "public").
  • Geospatial types cannot yet be compared.
  • We do not handle schema changes between commands well.

Data movement

flowchart LR
    LegacyDB[legacy database<br/>i.e. PG, MySQL]
    S3[(Amazon S3)]
    GCP[(Google Cloud<br/>Bucket)]
    Local[(Local File Server)]
    CRDB[CockroachDB]

    LegacyDB -- CSV Dump --> S3
    LegacyDB -- CSV Dump --> GCP
    LegacyDB -- CSV Dump --> Local
    S3 -- IMPORT INTO<br/>or COPY FROM --> CRDB
    GCP -- IMPORT INTO<br/>or COPY FROM --> CRDB
    Local -- "IMPORT INTO (exposes HTTP server)"<br/>or COPY FROM --> CRDB

    LegacyDB -- COPY FROM --> CRDB

molt datamove is able to migrate data from your PG or MySQL tables to CockroachDB without taking your PG/MySQL tables offline. It takes --source and --target as arguments (see molt verify documentation above for examples).

It outputs a cdc_cursor which can be fed to CDC programs (e.g. cdc-sink, AWS DMS) to migrate live data without taking your database offline.

It currently supports the following:

  • Pulling a table, uploading CSVs to S3/GCP/local machine (--listen-addr must be set) and running IMPORT on Cockroach for you.
  • Pulling a table, uploading CSVs to S3/GCP/local machine and running COPY TO on Cockroach from that CSV.
  • Pulling a table and running COPY TO directly onto the CRDB table without an intermediate store.

By default, data is imported using IMPORT INTO. You can use --live if you need target data to be queriable during loading, which uses COPY FROM instead.

Data can be truncated automatically if run with --truncate.

For now, schemas must be identical on both sides. This is verified upfront - tables with mismatching columns may only be partially migrated.

Example invocations

S3 usage:

# Ensure access tokens are appropriately set in the environment.
export AWS_REGION='us-east-1'
export AWS_SECRET_ACCESS_KEY='key'
export AWS_ACCESS_KEY_ID='id'
# Ensure the S3 bucket is created and accessible from CRDB.
molt datamove \
  --source 'postgres://postgres@localhost:5432/replicationload' \
  --target 'postgres://root@localhost:26257/defaultdb?sslmode=disable' \
  --table-filter 'good_table' \
  --s3-bucket 'otan-test-bucket' \
  --truncate \ # automatically truncate destination tables before importing 
  --cleanup # cleans up any created s3 files

GCP usage:

# Ensure credentials are loaded using `gcloud init`.
# Ensure the GCP bucket is created and accessible from CRDB.
molt datamove \
  --source 'postgres://postgres@localhost:5432/replicationload' \
  --target 'postgres://root@localhost:26257/defaultdb?sslmode=disable' \
  --table-filter 'good_table' \
  --gcp-bucket 'otan-test-bucket' \
  --cleanup # cleans up any created gcp files

Using a direct COPY FROM without storing intermediate files:

molt datamove \
  --source 'postgres://postgres@localhost:5432/replicationload' \
  --target 'postgres://root@localhost:26257/defaultdb?sslmode=disable' \
  --table-filter 'good_table' \
  --direct-copy

Storing CSVs locally before running COPY TO:

molt datamove \
  --source 'postgres://postgres@localhost:5432/replicationload' \
  --target 'postgres://root@localhost:26257/defaultdb?sslmode=disable' \
  --table-filter 'good_table' \
  --local-path /tmp/basic \
  --live

Storing CSVs locally and running a file server:

# set --local-path-crdb-access-addr if the automatic IP detection is incorrect.
molt datamove \
  --source 'postgres://postgres@localhost:5432/replicationload' \
  --target 'postgres://root@localhost:26257/defaultdb?sslmode=disable' \
  --table-filter 'good_table' \
  --local-path /tmp/basic \
  --local-path-listen-addr '0.0.0.0:9005'

Local Setup

Running Tests
  • Ensure a local postgres instance is setup and can be logged in using postgres://postgres:postgres@localhost:5432/testdb (this can be overriden with the POSTGRES_URL env var):
CREATE USER 'postgres' PASSWORD 'postgres' ADMIN;
CREATE DATABASE testdb;
  • Ensure a local, insecure CockroachDB instance is setup (this can be overriden with the COCKROACH_URL env var): cockroach demo --insecure --empty.
  • Ensure a local MySQL is setup with username root and an empty password, with a defaultdb database setup (this can be overriden with the MYSQL_URL env var):
CREATE DATABASE defaultdb;
  • Run the tests: go test ./....
    • Data-driven tests can be run with -rewrite, e.g. go test ./verification -rewrite.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
cmd
dbverify
Package dbverify is responsible for verifying two different databases match.
Package dbverify is responsible for verifying two different databases match.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL