tipo

module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 3, 2023 License: GPL-3.0

README

TIPO: Tidy Input Permuted Output

This tool provides reproducible data swapping on a JSONLine stream.

Configuration File

Here is an example YAML configuration file

version: 1
seed: 42
frameSize: 1000
selectors:
  - group1:
    - attribute1.*
    - attribute2.*
  - attribute3.*
  - attribute4.*
  • The seed parameter is the starting seed of the pseudo-random process.
  • The frameSize parameter is an essential element affecting the quality of the permutation, it is the size of the processing window. To ensure good permutation quality, its value must be large, in order to have a greater number of values ready to be permuted and to reduce the chance of having permutations with identical data at the origin.
  • The selectors parameter can be a group of attributes to be swapped together or attributes to be swapped independently of each other.

Execution example

Suppose our input stream of type JSONLine is stored in a "stream.jsonl" file

{"company":"Ese1","employees":[{"lastname":"Martin","firstname":"Lebaron","age":36,"children":[{"lastname":"agathe" ,"age":14}]},{"surname":"Josselin","firstname":"Jireau","age":57,"children":[{"surname":"Pierre","age" :14},{"name":"Damien","age":9}]}]}
{"company":"Ese2","employees":[{"surname":"Jérémie","firstname":"Namie","age":42,"children":[{"surname":"Patrice" ,"age":25},{"name":"Alex","age":10},{"name":"Lilie","age":2}]}]}
{"company":"Ese3","employees":[{"surname":"Océane","firstname":"Dupont","age":42,"children":[{"surname":"Alice" ,"age":25},{"name":"Maélie","age":10}]}]}

and the following configuration file is named "swapConf.yml":

version: 1
seed: 42
frameSize: 1000
selectors:
   - employees.children.name.*

In this case we do not want to swap a group of attributes, but only the names of the children of the employees.

1st possibility of execution
< stream.jsonl | tipo -c swapConf.yml

The result will be the following

{"company":"Ese1","employees":[{"lastname":"Martin","firstname":"Lebaron","age":36,"children":[{"lastname":"Damien" ,"age":14}]},{"surname":"Josselin","firstname":"Jireau","children":[{"surname":"Alex","age":14},{" name":"Peter","age":9}]}]}
{"company":"Ese2","employees":[{"surname":"Jérémie","firstname":"Namie","age":42,"children":[{"surname":"Patrice" ,"age":25},{"name":"agathe","age":10},{"name":"Maélie","age":2}]}]}
{"company":"Ese3","employees":[{"surname":"Océane","firstname":"Dupont","age":42,"children":[{"surname":"Lilie" ,"age":25},{"name":"Alice","age":10}]}]}

N.B.: The tipo command takes the path to the configuration file via the -c flag, if no path has been provided it will try to look by default for the swap.yml file which must be in the root of the project .

2nd possibility of execution

In the case where the configuration file is named "swap.yml", the execution can be done as follows

< stream.jsonl | type

Permutation of an attribute group

When the need is to permute a group of attributes in a coherent way, for example the name and the first name of the employees and that one wishes to have an independent permutation of the names of the children of the employees then the configuration file named " swap.yml" will have the following content

version: 1
seed: 42
frameSize: 1000
selectors:
  - group1:
    - employees.name.*
    - employees.firstname.*
  - employees.children.name.*

The way to execute is always the same

< stream.jsonl | type

The result will be the following

{"company":"Ese1","employees":[{"lastname":"Josselin","firstname":"Jireau","age":36,"children":[{"lastname":"Patrice" ,"age":14}]},{"surname":"Jérémie","firstname":"Namie","age":57,"children":[{"surname":"Alex","age" :14},{"name":"agathe","age":9}]}]}
{"company":"Ese2","employees":[{"lastname":"Martin","firstname":"Lebaron","age":42,"children":[{"lastname":"Lilie" ,"age":25},{"name":"Damien","age":10},{"name":"Peter","age":2}]}]}
{"company":"Ese3","employees":[{"surname":"Océane","firstname":"Dupont","age":42,"children":[{"surname":"Alice" ,"age":25},{"name":"Maélie","age":10}]}]}

Swap Multiple Attribute Groups

Suppose the following incoming stream is stored in a file named stream.jsonl

{"company":"company1","employees":[{"lastname":"Martin","firstname":"Lebaron","nationality":"italian","age":36,"children":[ {"name":"Damien","age":14}]}]}
{"company":"company2","employees":[{"surname":"Jérémie","firstname":"Namie","nationality":"French","age":44,"children":[ {"name":"Patrice","age":25},{"name":"agathe","age":10},{"name":"Maélie","age":2}]}] }
{"company":"company3","employees":[{"surname":"Océane","firstname":"Dupont","nationality":"Spanish","age":41,"children":[ {"name":"Lilie","age":25},{"name":"Alice","age":10}]}]}

The following corresponding configuration file is named configuration.yml

version: 1
seed: 42
frameSize: 1000
selectors:
  - group1:
    - employees.name.*
    - employees.firstname.*
  - group2:
    - employees.age.*
    - employees.nationality.*
  - employees.children.*

The permutation of the two groups will be done independently. The execution is done as follows

< stream.jsonl | type

And the result will be the following

{"company":"company1","employees":[{"lastname":"Jérémie","firstname":"Namie","nationality":"Spanish","age":41,"children":[ {"name":"Damien","age":14}]}]}
{"company":"company2","employees":[{"surname":"Océane","firstname":"Dupont","nationality":"italian","age":36,"children":[ {"name":"Patrice","age":25},{"name":"agathe","age":10},{"name":"Maélie","age":2}]}] }
{"company":"company3","employees":[{"lastname":"Martin","firstname":"Lebaron","nationality":"French","age":44,"children":[ {"name":"Lilie","age":25},{"name":"Alice","age":10}]}]}

Note that the age and nationality fields have been swapped consistently and independently of the surname and first name fields, which have also been swapped consistently.

Contributors

License

Copyright (C) 2023 CGI France

TIPO is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

TIPO is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with TIPO. If not, see http://www.gnu.org/licenses/.

Directories

Path Synopsis
cmd
pkg

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL