flowtocsv

package module
v0.0.0-...-bf8144f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 2, 2023 License: MIT Imports: 11 Imported by: 0

README

flowtocsv

This is an extremely lightweight MRASCO ETL library that can convert MRASCO flows into CSV files. Then you can load them into SQL to do automation / analysis. UNDER MIT LICENSE.... 2021 feel free to use it !

DOCUMENTATION: https://pkg.go.dev/github.com/tbal999/flowtocsv

FOR: Data analysts & BI Specialists in small energy industry retail firms who want to develop BI around energy industry data i.e gas and electricity data but do not have the capability/functionality to automate the parsing of MRASCO dataflows into their databases.

EXAMPLE USE CASE: You want to monitor read disputes actively, tracking when they are raised, and when they are rejected etc, but you don't want to hire an expensive contractor to build you a workflow.

  1. Ensure you have SFTP server (or API) up and running which gives you access to daily incoming D0300 dataflows
  2. Set up pipeline to download all D0300 dataflows daily to server to a flowcrunch_inputfiles folder
  3. Set up flowtocsv so that it can parse D0300 dataflows into CSV format
  4. Upload CSV file into SQL using staging/production tables & merge methodology
  5. Use Tableau / Power BI / Qlik to load table into view and join necessray tables (will require energy industry knowledge!)

You've got an up and running BI pipeline that keeps track of read disputes and whether they are successful/rejected/ignored etc etc.

The number of use cases for this tool is AT LEAST the same as the number of UK energy industry dataflows that exist for suppliers/retailers (128). Each of varying significance and impact.

This particular library can also handle other flat file types. My stance would be to try it out. If it works, great! Through doing this I discovered new ways how to wrangle data using recursion in go. This library can be inserted into ETL packaged solutions.

At the moment there are a few workflows for this library.

  1. SFTP server -> download data flows to a folder -> parse data flows into csv -> load into SQL.
  2. API -> load clobs into Go as strings -> parse clobs into csv -> load into SQL.

Example code:

package main

import (
	flow "github.com/tbal999/flowtocsv"
)

func main() {
	f := flow.Instructions{}
	f.Init()
	f.LearnFile("|") //f.Learn(",") - for Gas SPAA Dataflows.
	f.StartFiles()
}

This will create four folders - flowcrunch_inputfiles, flowcrunch_instructions, flowcrunch_learn and flowcrunch_outputfiles. Simply save an example dataflow i.e a D0150001 in the 'learn' folder and it will generate the necessary instructions to parse that dataflow into a CSV format:

a fake dataflow clob
ZHV|99999|D0150001|M|LLLL|X|AAAA|12345||||TR01|
288|12345|20179568|D|x|
289|0158|20179568|Z||
290|XXXTEST011|||40|F|A2Z 5000|LLLL|||||||||||K|20179568|20179568|20200101|||H|20179568|
293|L|C|AI|9.00||9||DE|
293|N|C|AI|9.00||7|||
08A|KXXA 12387|20179568|LLLL|
ZPT|99999|6||1|12345|

Then it will learn the structure of the dataflow and create a set of instructions in the 'instructions' folder:

{
	"Dataflow": "150",
	"Delimiter": "|",
	"DataItems": [
		"288",
		"289",
		"290",
		"293",
		"08A"
	],
	"Spaces": [
		4,
		4,
		25,
		8,
		3
	],
	"Headers": [
		"150_COLUMN_1",
		"150_COLUMN_2",
		"150_COLUMN_3",
		"150_COLUMN_4",
		...
		...
	],
	"Outputname": "150_Converted"
}

Then you can place as many D0150 dataflows as you want in the 'inputfiles' folder and it will create an aggregated CSV file of all those dataflows. Instructions can sometimes add too many columns, so just remove some from the JSON if necessary.

This methodology works for nearly all UK energy industry data flows.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// Testing - set this bool to 'true' if you want to test the conversion of flows step by step (in case you spot a bug or something)
	Testing = false
)

Functions

This section is empty.

Types

type Instructions

type Instructions struct {
	Dataflow   string
	Delimiter  string
	DataItems  []string
	Spaces     []int
	Headers    []string
	Outputname string
	// contains filtered or unexported fields
}

Instructions is an exported struct that contains the necessary configuration to convert an MRASCO flow into a CSV file. These instructions are stored as a JSON When you generate instructions, you can adjust the headers (which is the columns of the output CSV) names to match the contents of that column.

func (Instructions) ConvertClob

func (i Instructions) ConvertClob(clob string) [][]string

ConvertClob begins the conversion of clob strings into a 2D array which can either be written to CSV or passed to a database/warehouse. Before using ConvertClob, you'll want to use the LearningClob function.

func (*Instructions) ConvertFile

func (in *Instructions) ConvertFile(filename string)

ConvertFile begins the process of converting the dataflow (file) into a CSV file - depending on what instruction has been loaded. Usually you just need to use 'Start' to convert all dataflows, but this is exported so you can target one file if necessary.

func (Instructions) Init

func (i Instructions) Init()

Init ensures the three folders that are necessary are created in same folder as the executable/main package always start with intructions.Init() flowcrunch_instructions - where JSON instructions are kept flowcrunch_inputfiles - where any dataflows that need to be converted are kept flowcrunch_outputfiles - where any generated CSVs are placed init also clears any files contained within the 'flowcrunch_outputfiles' folder every time

func (Instructions) LearnFile

func (i Instructions) LearnFile(delimiter string)

LearnFile takes in all dataflows with no duplicate items but at least one of every significant item so that it can learn the dataflow structure for converting to CSV All you need to do is save one energy indsutry dataflow with the filename as the dataflow identifier (i.e D0150001.txt) in the flowcrunch_learn folder. Then it saves the instructions to a JSON file saved in a folder named 'flowcrunch_instructions'. It requires you to insert the delimiter of the files to learn i.e a pipe '|' or comma.

func (*Instructions) LearningClob

func (in *Instructions) LearningClob(dataflowname, content, delimiter string)

LearningClob takes in a dataflow name i.e D0150 (string), dataflow clob (string) and delimiter (string) and creates instructions on how to convert that clob into a CSV version of the dataflow. instructions are saved to flowcrunch_instructions folder as JSON

func (Instructions) StartFiles

func (i Instructions) StartFiles()

Start begins the conversion of energy industry dataflows into CSV files. Before using start, you'll want to use the Learn function.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL