pg-upgrade-data-check
Installing
go install github.com/processout/pg-upgrade-data-check@latest
Description
pg-upgrade-data-check will compare Postgres table contents of two databases (source and target) to ensure there are no differences
in data.
It will detect discrepancies like;
- Row missing in the target database table
- Extra row in the target database table
- Difference in any column value for the same id
Note: It's designed to work purely on tables content (it will not check indexes, etc).
Note 2: It will only work for tables that have numeric id that is incremented for each row.
Configuration
All configurations related to tables are contained in the examples/config.yaml file. You must specify the table that needs to be checked in .tables and then configure a function for collect and compare:
collect: is executed in first and second stage in order to identify range of id's to compare, usually the max of the current table.
compare: should return two values: id and hash; hash is used to compare rows of the same id between two databases.
A full example would be:
tables:
users:
collect: select max(id) from %s limit 1;
compare: select id, md5(%s::TEXT) as hash from %s where id>=%d and id<=%d order by id asc;
Each command will require the database DSN, which might be indicated by --source-url, --target-url or both.
Usage
As pg-upgrade-data-check has been designed primarily for verifying effects of cloning & replication, it operates in three stages: before, after and compare. Each of these are commands in the cli itself, e.g: pg-upgrade-data-check before --source-url postgres://postgres@localhost:6001/postgres
Stage 1: before
pg-upgrade-data-check before --source-url postgres://postgres@localhost:6001/postgres --config config.yaml
For each configured table find the starting id. This query is executed on the source database, and output is
saved to data/before_ids.json file. For safety reasons it is not possible to execute this step if the destination
file already exist. Delete that file first if you don't care about previous result.
When using pg-upgrade-data-check during Postgres upgrade this stage should be executed before you create a replication
slot on the source database.
Stage 2: after
pg-upgrade-data-check after --target-url postgres://postgres@localhost:6002/postgres --config config.yaml
For each configured table find the ending id. Again, this is executed on the source database, with output written
this time to data/after_ids.json file. It has the same safety mechanism as previous stage.
When using pg-upgrade-data-check during Postgres upgrade this stage should be executed after target database has been restored
from snapshot and logical replication caught up to source.
Stage 3: compare
pg-upgrade-data-check after --source-url postgres://postgres@localhost:6001/postgres --target-url postgres://postgres@localhost:6002/postgres --config config.yaml
Load the contents of after_ids.json (generated by the previous stages) and for all tables from configuration compare
rows between startId and stopId, detecting additionally any missing or extra rows on the target db.