mr-plow

command module
v1.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 31, 2022 License: Apache-2.0 Imports: 7 Imported by: 0

README

example workflow codecov

Mr-Plow

Tiny and minimal tool to export data from relational db (postgres or mysql) to elasticsearch.

The tool does not implement all the logstash features, but its goal is to be an alternative to logstash when keeping in-sync elastic and a relational database.

Goal

Low memory usage: (~15 MB when idle, great to be deployed on cloud environments).

Stateless: a timestamp/date column is used in order to filter inserted/update data and to avoid fetching already seen data. During the startup Mr-Plow checks the data inserted into elasticsearch to check the last timestamp/date of the transferred data, and so it does not require a local state.

image

Usage:

Mr-Plow essentially executes queries on a relational database and writes these data to ElasticSearch. The configured queries are run in parallel, and data are written incrementally, it's only sufficient to specify a timestamp/date column in the queries in order to get only newly updated/inserted data.

This is a basic configuration template example, where we only specify two queries and the endpoint configuration (one Postgres database and one ElasticSearch cluster:

# example of config.yml
pollingSeconds: 5 #database polling interval
database: "postgres://user:pwd@localhost:5432/postgres?sslmode=disable" #specify here the db connection
queries: #put here one of more queries (each one will be executed in parallel):
  - query: "select * from my_table1 where last_update > $1" #please add a filter on an incrementing date/ts column using the $1 value as param
    index: "table1_index" #name of the elastic output index
    updateDate: "last_update" #name of the incrementing date column
    id: "customer_id" #optional, column to use as elasticsearch id
  - query: "select * from my_table2 where ts > $1"
    index: "table2_index"
    updateDate: "ts"
elastic:
  url: "http://localhost:9200"
  user: "elastic_user" #optional
  password: "my_secret" #optional
  numWorker: 10 #optional, number of worker for indexing each query
  caCertPath: "my/path/ca" #optional, path of custom CA file (it may be needed in some HTTPS connection..)

Anyway, Mr Plow has also additional features, for example interacting with a database like Postgres, supporting JSON columns, we can specify JSON fields, in order to create a complex (nested) object to be created in Elastic. In the following, we show an example where in the Employee table we store two dynamic JSON fields, one containing the Payment Data and another one containing additional informations for the employee:

pollingSeconds: 5
database: databaseValue
queries:
  - index: index_1
    query: select * from employees
    updateDate: last_update
    JSONFields:
      - fieldName: payment_data
      - fieldName: additional_infos
    id: MyId_1

And additionally, we can specify the type expected for some specific fields. Please note hat field type is optional and if not specified, the field is casted as String.

Actually supported type are: String, Integer, Float and Boolean

pollingSeconds: 5
database: databaseValue
queries:
  - index: index_1
    query: select * from employees
    updateDate: last_update
    fields: # Optional config, casting standard sql columns to specific data type
      - name: name
        type: String
      - name: working_hours
        type: Integer

Merging the previous two examples, we can apply the type casting also to inner JSON fields, here is a complete example of configuration:

pollingSeconds: 5
database: databaseValue
queries:
  - index: index_1
    query: select * from employees
    updateDate: last_update
    fields: # Optional, casting standard sql columns to specific data type
      - name: name
        type: String
      - name: working_hours
        type: Integer
    JSONFields:
      - fieldName: payment_info
        fields: # Optional, casting json fields to specific data type
          - name: bank_account
            type: String
          - name: validated
            type: Boolean
    id: MyId_1

Download or build the binary (docker images will be released soon):

go build

Run the tool:

./mr-plow -config /path/to/my/config.yml

To build as docker image, create a config.yml and put into the root folder of the project. Then run:

docker build .

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL