pg_dump_sample

command module

v0.0.0-...-ae0445c Latest Latest Go to latest Published: Oct 22, 2021 License: MIT Imports: 13 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/calazans10/pg_dump_sample

Links

Open Source Insights

README ¶

pg_dump_sample

This is a simple tool for dumping a sample of data from a PostgreSQL database. The resulting dump can be loaded back into an new database using standard tools (e.g. psql(1)).

Why would I want this?

It is useful if you have a huge PostgreSQL database and you need a database with a small dataset for testing or development.

Features

Data sampling: Dump either all rows of a table or only rows matching the specified query.
Easy anonymization of data: You have full control over what's in the dump, it's very easy to create anonymized DB dumps.
Templated queries: You can extract common parts of queries into one place. That's especially useful if you have e.g. a list of IDs the dump should contain.
Table dependency awareness - Table A will be dumped before table B automatically if table B has foreign keys pointing to table A.
Standard SQL - The produced dump files contain just a bunch of SQL commands. They can be loaded by standard tools such as psql(1) or processed further.

How to build

The pg_dump_sample is written in Go. You need to setup the Go compiler and setup environment first to build it.

go get github.com/calazans10/pg_dump_sample

If it went well you should be able to run it:

pg_dump_sample

If not check that you have $GOPATH/bin in your $PATH.

How to use

A quick example:

pg_dump_sample -f mydb.yaml -h mydbhost.dev -U postgres -o mydb_dump.sql mydb

Available command-line options:

Usage:
  pg_dump_sample [options] database

Application Options:
  -h, --host=          Database server host or socket directory (default: local socket) [$PGHOST]
  -p, --port=          Database server port (default: 5432) [$PGPORT]
  -U, --username=      Database user name (default: current user) [$PGUSER]
  -w, --no-password    Don't prompt for password
  -f, --manifest-file= Path to manifest file
  -o, --output-file=   Path to the output file
  -s, --tls            Use SSL/TLS database connection
      --help           Show help

The available command-line options are heavily inspired by pg_dump(1). Anyone familiar with it should feel right at home.

It is also possible to use environmental variables to set the options. Note that the environmental variables have lower precenence than command-line options. The supported variables are:

Environmental variable	Description
`PGHOST`	`-h, --host`
`PGPORT`	`-p, --port`
`PGUSER`	`-U, --username`
`PGPASSWORD`	Used to set the password. Use of this environment variable is not recommended for security reasons (some operating systems allow non-root users to see process environment variables via ps)
`PGDATABASE`	database

Manifest file

The main difference between pg_dump_sample and pg_dump(1) is that pg_dump_sample requires a manifest file describing how to dump the database. The manifest file is a YAML file describing what tables to dump and how to dump them.

A quick example:

---
vars:
  # Condition to dump only certain users
  matching_user_id: "(users.id BETWEEN 1000 AND 2000)"

tables:
  # Dump everything from table "consts"
  - table: consts

  # Dump only matching users
  - table: users
    query: "SELECT * FROM users WHERE {{matching_user_id}}"
    post_actions:
      - "SELECT pg_catalog.setval('users_id_seq', MAX(id) + 1, true) FROM users"

  # Dump only tickets that were bought by matching users
  - table: tickets
    query: >
      SELECT purchases.* FROM purchases, users
      WHERE
        purchases.buyer_id = users.id
        AND {{matching_user_id}}

Currently these top-level keys are available:

`vars`

Definitions of variables which will be used to replace placeholders in queries.

`tables`

List of tables to dump. Tables are dumped in the order they are specified in the manifest file, with one exception: if the table contains foreign keys referencing another table, the referenced table will be dumped first. This is to ensure that the dump can be loaded later without errors.

By default all rows of the table will be dumped. If you don't want to dump all the rows use the query to specify a SELECT SQL statement which returns the rows you want to dump.

TODO

Use separate vars files to override vars from manifest?
Allow setting vars using command-line options?

Contribute

Fork it
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Submit a pull request

License

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL