sqltocsvgzip

package module

v0.0.13 Latest Latest Go to latest Published: Feb 1, 2021 License: BSD-3-Clause, MIT Imports: 22 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/thatInfrastructureGuy/sqltocsvgzip

Links

Open Source Insights

README ¶

sqltocsvgzip

A library designed to convert sql.Rows result from a query into a CSV.GZIP file and/or upload to AWS S3.

Features

Multi-threaded Gzip compression
Concurrent multipart S3 uploads
Upload retries for resiliency
Uploading to S3 does not require local storage.
Consistent memory, cpu and network usage irrespective of number of sql.Rows.

Installation

go get github.com/thatInfrastructureGuy/sqltocsvgzip@v0.0.13

Note: Please do not use master branch. Master branch may contain breaking changes. Use tags instead.

Functions

Uploading ToS3
- One-Liner: UploadToS3(rows)
- Set up config: UploadConfig(rows) + Upload()
Writing to File
- One-Liner: WriteFile(rows, filename)
- Set up config: WriteConfig(rows) + WriteFile(filename)

Usage

Note: Check out examples directory for examples.

Importing the package

import (
    "database/sql"
    _ "github.com/go-sql-driver/mysql" // or the driver of your choice
    "github.com/thatinfrastructureguy/sqltocsvgzip"
)

Dumping a query to a file

rows, _ := db.Query("SELECT * FROM users WHERE something=72")

err := sqltocsvgzip.WriteFile("~/important_user_report.csv.gzip", rows)
if err != nil {
    panic(err)
}

Upload to AWS S3 with env vars

rows, _ := db.Query("SELECT * FROM users WHERE something=72")

// UploadToS3 looks for the following environment variables.
// Required: S3_BUCKET, S3_PATH, S3_REGION
// Optional: S3_ACL (default => bucket-owner-full-control)
err := sqltocsvgzip.UploadToS3(rows)
if err != nil {
    panic(err)
}

Upload to AWS S3 without environment variables

rows, _ := db.Query("SELECT * FROM users WHERE something=72")

config := sqltocsvgzip.UploadConfig(rows)
config.S3Bucket = "mybucket"
config.S3Path = "/myfolder/file.csv.gzip"
config.S3Region = "us-west-1"

err := config.Upload()
if err != nil {
    panic(err)
}

Return a query as a GZIP download on the world wide web

http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
    rows, err := db.Query("SELECT * FROM users WHERE something=72")
    if err != nil {
        http.Error(w, err, http.StatusInternalServerError)
        return
    }
    defer rows.Close()

    w.Header().Set("Content-Type", "application/javascript")
    w.Header().Set("Content-Encoding", "gzip")
    w.Header().Set("Content-Disposition", "attachment; filename=\"report.csv.gzip\"")

    config := sqltocsvgzip.WriteConfig(rows)
    err = config.Write(w)
    if err != nil {
        http.Error(w, err, http.StatusInternalServerError)
        return
    }
})
http.ListenAndServe(":8080", nil)

If you need more flexibility you can get an instance of a config and fiddle with a few settings.

rows, _ := db.Query("SELECT * FROM users WHERE something=72")

config := sqltocsvgzip.WriteConfig(rows)

config.TimeFormat = time.RFC822
config.Headers = append(rows.Columns(), "extra_column_one", "extra_column_two")

config.SetRowPreProcessor(func (columns []string) (bool, []string) {
    // exclude admins from report
    // NOTE: this is a dumb example because "where role != 'admin'" is better
    // but every now and then you need to exclude stuff because of calculations
    // that are a pain in sql and this is a contrived README
    if columns[3] == "admin" {
      return false, []string{}
    }

    extra_column_one = generateSomethingHypotheticalFromColumn(columns[2])
    extra_column_two = lookupSomeApiThingForColumn(columns[4])

    return append(columns, extra_column_one, extra_column_two)
})

config.WriteFile("~/important_user_report.csv.gzip")

Defaults

10Mb default csv buffer size.
50Mb default zip buffer size.
Zipping: Default runtime.GOMAXPROCS(0) goroutines with 512Kb data/goroutine
4 concurrent upload goroutines.

Caveats

Minimum PartUploadSize should be greater than 5 Mb.
Maximum of 10000 part uploads are allowed by AWS. Hence, (50Mb x 10000) 500Gb of gzipped data is supported by default settings.
Increase buffer size if you want to reduce parts or have more than 500Gb of gzipped data.
Currently only supports upload to AWS S3 API compatible storage.

System Requirements

Minimum:
- CPU: 2 vcpu
- Memory: 500Mb (Depends on sql data size.)
- Disk: Only needed if your writing to a file locally. (> size of gzip file)

Credits:

Documentation ¶

Overview ¶

sqltocsvgzip package converts database query results (in the form of database/sql Rows) into CSV.GZIP output.

Source and README at https://github.com/thatInfrastructureGuy/sqltocsvgzip

Index ¶

func UploadToS3(rows *sql.Rows) (rowCount int64, err error)
func WriteFile(csvGzipFileName string, rows *sql.Rows) (rowCount int64, err error)
type Converter
- func UploadConfig(rows *sql.Rows) *Converter
- func WriteConfig(rows *sql.Rows) *Converter
type CsvPreProcessorFunc
type LogLevel

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func UploadToS3 ¶ added in v0.0.3

func UploadToS3(rows *sql.Rows) (rowCount int64, err error)

UploadToS3 will upload a CSV.GZIP file to AWS S3 bucket (with headers) based on whatever is in the sql.Rows you pass in. UploadToS3 looks for the following environment variables. Required: S3_BUCKET, S3_PATH, S3_REGION Optional: S3_ACL (default => bucket-owner-full-control)

func WriteFile ¶

func WriteFile(csvGzipFileName string, rows *sql.Rows) (rowCount int64, err error)

WriteFile will write a CSV.GZIP file to the file name specified (with headers) based on whatever is in the sql.Rows you pass in.

Types ¶

type Converter ¶

type Converter struct {
	LogLevel              LogLevel
	Headers               []string // Column headers to use (default is rows.Columns())
	WriteHeaders          bool     // Flag to output headers in your CSV (default is true)
	TimeFormat            string   // Format string for any time.Time values (default is time's default)
	Delimiter             rune     // Delimiter to use in your CSV (default is comma)
	CsvBufferSize         int
	CompressionLevel      int
	GzipGoroutines        int
	GzipBatchPerGoroutine int
	S3Bucket              string
	S3Region              string
	S3Acl                 string
	S3Path                string
	S3Upload              bool
	UploadThreads         int
	UploadPartSize        int
	RowCount              int64
	// contains filtered or unexported fields
}

Converter does the actual work of converting the rows to CSV. There are a few settings you can override if you want to do some fancy stuff to your CSV.

func UploadConfig ¶ added in v0.0.6

func UploadConfig(rows *sql.Rows) *Converter

UploadConfig sets the default values for Converter struct.

func WriteConfig ¶ added in v0.0.6

func WriteConfig(rows *sql.Rows) *Converter

WriteConfig will return a Converter which will write your CSV however you like but will allow you to set a bunch of non-default behaivour like overriding headers or injecting a pre-processing step into your conversion

func (*Converter) AddToQueue ¶ added in v0.0.3

func (c *Converter) AddToQueue(buf *bytes.Buffer, lastPart bool)

AddToQueue sends obj over the upload queue. Currently, It is designed to work with AWS multipart upload. If the part body is less than 5Mb in size, 2 parts are combined together before sending.

func (*Converter) SetRowPreProcessor ¶

func (c *Converter) SetRowPreProcessor(processor CsvPreProcessorFunc)

SetRowPreProcessor lets you specify a CsvPreprocessorFunc for this conversion

func (*Converter) Upload ¶ added in v0.0.4

func (c *Converter) Upload() (rowCount int64, err error)

Upload uploads the csv.gzip, return an error if problem. Creates a Multipart AWS requests. Completes the multipart request if all uploads are successful. Aborts the operation when an error is received.

func (*Converter) UploadObjectToS3 ¶ added in v0.0.3

func (c *Converter) UploadObjectToS3(w io.Writer) error

UploadObjectToS3 uploads a file to AWS S3 without batching.

func (*Converter) UploadPart ¶ added in v0.0.5

func (c *Converter) UploadPart() (err error)

UploadPart listens to upload queue. Whenever an obj is received, it is then uploaded to AWS. Abort operation is called if any error is received.

func (*Converter) Write ¶

func (c *Converter) Write(w io.Writer) error

Write writes the csv.gzip to the Writer provided

func (*Converter) WriteFile ¶

func (c *Converter) WriteFile(csvGzipFileName string) (rowCount int64, err error)

WriteFile writes the csv.gzip to the filename specified, return an error if problem

type CsvPreProcessorFunc ¶

type CsvPreProcessorFunc func(row []string, columnNames []string) (outputRow bool, processedRow []string)

CsvPreprocessorFunc is a function type for preprocessing your CSV. It takes the columns after they've been munged into strings but before they've been passed into the CSV writer.

Return an outputRow of false if you want the row skipped otherwise return the processed Row slice as you want it written to the CSV.

type LogLevel ¶ added in v0.0.4

type LogLevel int

const (
	Error   LogLevel = 1
	Warn    LogLevel = 2
	Info    LogLevel = 3
	Debug   LogLevel = 4
	Verbose LogLevel = 5
)

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL