sqltocsvgzip

package module
v0.0.13 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 1, 2021 License: BSD-3-Clause, MIT Imports: 22 Imported by: 0

README

sqltocsvgzip Build Status

A library designed to convert sql.Rows result from a query into a CSV.GZIP file and/or upload to AWS S3.

Features

  • Multi-threaded Gzip compression
  • Concurrent multipart S3 uploads
  • Upload retries for resiliency
  • Uploading to S3 does not require local storage.
  • Consistent memory, cpu and network usage irrespective of number of sql.Rows.
Installation
go get github.com/thatInfrastructureGuy/sqltocsvgzip@v0.0.13

Note: Please do not use master branch. Master branch may contain breaking changes. Use tags instead.

Functions
  • Uploading ToS3
    • One-Liner: UploadToS3(rows)
    • Set up config: UploadConfig(rows) + Upload()
  • Writing to File
    • One-Liner: WriteFile(rows, filename)
    • Set up config: WriteConfig(rows) + WriteFile(filename)
Usage

Note: Check out examples directory for examples.

Importing the package

import (
    "database/sql"
    _ "github.com/go-sql-driver/mysql" // or the driver of your choice
    "github.com/thatinfrastructureguy/sqltocsvgzip"
)
  1. Dumping a query to a file
rows, _ := db.Query("SELECT * FROM users WHERE something=72")

err := sqltocsvgzip.WriteFile("~/important_user_report.csv.gzip", rows)
if err != nil {
    panic(err)
}
  1. Upload to AWS S3 with env vars
rows, _ := db.Query("SELECT * FROM users WHERE something=72")

// UploadToS3 looks for the following environment variables.
// Required: S3_BUCKET, S3_PATH, S3_REGION
// Optional: S3_ACL (default => bucket-owner-full-control)
err := sqltocsvgzip.UploadToS3(rows)
if err != nil {
    panic(err)
}
  1. Upload to AWS S3 without environment variables
rows, _ := db.Query("SELECT * FROM users WHERE something=72")

config := sqltocsvgzip.UploadConfig(rows)
config.S3Bucket = "mybucket"
config.S3Path = "/myfolder/file.csv.gzip"
config.S3Region = "us-west-1"

err := config.Upload()
if err != nil {
    panic(err)
}
  1. Return a query as a GZIP download on the world wide web
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
    rows, err := db.Query("SELECT * FROM users WHERE something=72")
    if err != nil {
        http.Error(w, err, http.StatusInternalServerError)
        return
    }
    defer rows.Close()

    w.Header().Set("Content-Type", "application/javascript")
    w.Header().Set("Content-Encoding", "gzip")
    w.Header().Set("Content-Disposition", "attachment; filename=\"report.csv.gzip\"")

    config := sqltocsvgzip.WriteConfig(rows)
    err = config.Write(w)
    if err != nil {
        http.Error(w, err, http.StatusInternalServerError)
        return
    }
})
http.ListenAndServe(":8080", nil)

If you need more flexibility you can get an instance of a config and fiddle with a few settings.

rows, _ := db.Query("SELECT * FROM users WHERE something=72")

config := sqltocsvgzip.WriteConfig(rows)

config.TimeFormat = time.RFC822
config.Headers = append(rows.Columns(), "extra_column_one", "extra_column_two")

config.SetRowPreProcessor(func (columns []string) (bool, []string) {
    // exclude admins from report
    // NOTE: this is a dumb example because "where role != 'admin'" is better
    // but every now and then you need to exclude stuff because of calculations
    // that are a pain in sql and this is a contrived README
    if columns[3] == "admin" {
      return false, []string{}
    }

    extra_column_one = generateSomethingHypotheticalFromColumn(columns[2])
    extra_column_two = lookupSomeApiThingForColumn(columns[4])

    return append(columns, extra_column_one, extra_column_two)
})

config.WriteFile("~/important_user_report.csv.gzip")
Defaults
  • 10Mb default csv buffer size.
  • 50Mb default zip buffer size.
  • Zipping: Default runtime.GOMAXPROCS(0) goroutines with 512Kb data/goroutine
  • 4 concurrent upload goroutines.
Caveats
  • Minimum PartUploadSize should be greater than 5 Mb.
  • Maximum of 10000 part uploads are allowed by AWS. Hence, (50Mb x 10000) 500Gb of gzipped data is supported by default settings.
  • Increase buffer size if you want to reduce parts or have more than 500Gb of gzipped data.
  • Currently only supports upload to AWS S3 API compatible storage.
System Requirements
  • Minimum:
    • CPU: 2 vcpu
    • Memory: 500Mb (Depends on sql data size.)
    • Disk: Only needed if your writing to a file locally. (> size of gzip file)

Credits:

Documentation

Overview

sqltocsvgzip package converts database query results (in the form of database/sql Rows) into CSV.GZIP output.

Source and README at https://github.com/thatInfrastructureGuy/sqltocsvgzip

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func UploadToS3 added in v0.0.3

func UploadToS3(rows *sql.Rows) (rowCount int64, err error)

UploadToS3 will upload a CSV.GZIP file to AWS S3 bucket (with headers) based on whatever is in the sql.Rows you pass in. UploadToS3 looks for the following environment variables. Required: S3_BUCKET, S3_PATH, S3_REGION Optional: S3_ACL (default => bucket-owner-full-control)

func WriteFile

func WriteFile(csvGzipFileName string, rows *sql.Rows) (rowCount int64, err error)

WriteFile will write a CSV.GZIP file to the file name specified (with headers) based on whatever is in the sql.Rows you pass in.

Types

type Converter

type Converter struct {
	LogLevel              LogLevel
	Headers               []string // Column headers to use (default is rows.Columns())
	WriteHeaders          bool     // Flag to output headers in your CSV (default is true)
	TimeFormat            string   // Format string for any time.Time values (default is time's default)
	Delimiter             rune     // Delimiter to use in your CSV (default is comma)
	CsvBufferSize         int
	CompressionLevel      int
	GzipGoroutines        int
	GzipBatchPerGoroutine int
	S3Bucket              string
	S3Region              string
	S3Acl                 string
	S3Path                string
	S3Upload              bool
	UploadThreads         int
	UploadPartSize        int
	RowCount              int64
	// contains filtered or unexported fields
}

Converter does the actual work of converting the rows to CSV. There are a few settings you can override if you want to do some fancy stuff to your CSV.

func UploadConfig added in v0.0.6

func UploadConfig(rows *sql.Rows) *Converter

UploadConfig sets the default values for Converter struct.

func WriteConfig added in v0.0.6

func WriteConfig(rows *sql.Rows) *Converter

WriteConfig will return a Converter which will write your CSV however you like but will allow you to set a bunch of non-default behaivour like overriding headers or injecting a pre-processing step into your conversion

func (*Converter) AddToQueue added in v0.0.3

func (c *Converter) AddToQueue(buf *bytes.Buffer, lastPart bool)

AddToQueue sends obj over the upload queue. Currently, It is designed to work with AWS multipart upload. If the part body is less than 5Mb in size, 2 parts are combined together before sending.

func (*Converter) SetRowPreProcessor

func (c *Converter) SetRowPreProcessor(processor CsvPreProcessorFunc)

SetRowPreProcessor lets you specify a CsvPreprocessorFunc for this conversion

func (*Converter) Upload added in v0.0.4

func (c *Converter) Upload() (rowCount int64, err error)

Upload uploads the csv.gzip, return an error if problem. Creates a Multipart AWS requests. Completes the multipart request if all uploads are successful. Aborts the operation when an error is received.

func (*Converter) UploadObjectToS3 added in v0.0.3

func (c *Converter) UploadObjectToS3(w io.Writer) error

UploadObjectToS3 uploads a file to AWS S3 without batching.

func (*Converter) UploadPart added in v0.0.5

func (c *Converter) UploadPart() (err error)

UploadPart listens to upload queue. Whenever an obj is received, it is then uploaded to AWS. Abort operation is called if any error is received.

func (*Converter) Write

func (c *Converter) Write(w io.Writer) error

Write writes the csv.gzip to the Writer provided

func (*Converter) WriteFile

func (c *Converter) WriteFile(csvGzipFileName string) (rowCount int64, err error)

WriteFile writes the csv.gzip to the filename specified, return an error if problem

type CsvPreProcessorFunc

type CsvPreProcessorFunc func(row []string, columnNames []string) (outputRow bool, processedRow []string)

CsvPreprocessorFunc is a function type for preprocessing your CSV. It takes the columns after they've been munged into strings but before they've been passed into the CSV writer.

Return an outputRow of false if you want the row skipped otherwise return the processed Row slice as you want it written to the CSV.

type LogLevel added in v0.0.4

type LogLevel int
const (
	Error   LogLevel = 1
	Warn    LogLevel = 2
	Info    LogLevel = 3
	Debug   LogLevel = 4
	Verbose LogLevel = 5
)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL