datadiff

command module
v0.0.0-...-b105dfa Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 7, 2024 License: MIT Imports: 3 Imported by: 0

README

datadiff

Go Report Card

Datadiff is a library and CLI tool to find differences between two data sources. This is useful when there is a primary data source and a secondary data source and they both need to contain the same records.

This tool considers two data sources to be qual if they contain the same numeric IDs. This approach does not compare any other field value.

Strategy

Rather than comparing record by record, this library compares the histograms of the numeric IDs from both sources. These are the steps taken:

  • Create a histogram of the numeric IDs from the primary data source.
  • Create a histogram of the numeric IDs from the secondary data source.
  • Merge and compare the histograms.
  • If the bin capacities are full, mark this range as resolved.
  • Fetch the histogram of the unresolved bins with smaller bin sizes.
  • Merge and compare the histograms.
  • Fetch the ids of the unresolved bins.
  • Compare the numeric IDs of unresolved bins and output the results.
Supported Data Sources
  • mysql
  • elasticsearch
Usage

Run datadiff -h to get usage information

$ ./datadiff -h
Usage of ./datadiff:
  -interval int
        Initial histogram interval (default 1000)
  -mconf string
        Primary configuration string (default "{}")
  -mconn string
        Primary connection string
  -mdriver string
        Primary driver [elasticsearch|mysql]
  -sconf string
        Secondary configuration string (default "{}")
  -sconn string
        Secondary connection string
  -sdriver string
        Secondary driver [elasticsearch|mysql]
Sample Command Line Usage
 datadiff -interval 200 \
 -mdriver 'mysql' \
 -mconn 'root:root@(localhost:3306)/my_db_name?charset=utf8' \
 -mconf '{"table_name":"my_table_name", "field_name":"my_id_field_name", "conditions":["`active` = 1", "`user_id` = 100"]}' \
 -sdriver 'elasticsearch' \
 -sconn 'http://localhost:9200' \
 -sconf '{"index":"my_index_name", "type":"my_type_name", "field":"my_id_field_path"}'
mysql://root:root@localhost:3306/dbname?table=tablename&field=id
es://http://localhost:9200?index=indexname&field=id

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL