csv-reader

command module
v0.0.0-...-62af069 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 14, 2023 License: Unlicense Imports: 4 Imported by: 0

README

CSV file reader

CSV file reader with an e-mail's domain occurrences counter.

Optimization research & ideas

  1. Use buffered reading (bufio package),
  2. Parallel processing (processes email domains concurrently using worker goroutines),
  3. Optimize data structures (structs for readability/maintainability improvement),
  4. Benchmark tests (benchmarking different-sized input data files),
  5. Code profiling (pprof tool to identify specific bottlenecks).

Environment variables

  • To override config variables change the values in .env file. The default values:

    CONCURRENCY=4
    INPUT_CSV_FILE_PATH_DEFAULT=./data/test/customers_3k_lines.csv
    INPUT_CSV_FILE_PATH_0_LINES=../data/test/customers_0_lines.csv
    INPUT_CSV_FILE_PATH_10_LINES=../data/test/customers_10_lines.csv
    INPUT_CSV_FILE_PATH_3K_LINES=../data/test/customers_3k_lines.csv
    INPUT_CSV_FILE_PATH_10M_LINES=../data/test/customers_10m_lines.csv*
    READ_BUFFER_SIZE_IN_BYTES=4096
    

* customers_10m_lines.csv file is stored locally due to the size (over 500 MB). It is used in benchmark tests.

Screenshots from benchmark execution

  • CONCURRENCY=1, READ_BUFFER_SIZE_IN_BYTES=4096

  • CONCURRENCY=6, READ_BUFFER_SIZE_IN_BYTES=4096

  • CONCURRENCY=12, READ_BUFFER_SIZE_IN_BYTES=4096

  • CONCURRENCY=1, READ_BUFFER_SIZE_IN_BYTES=8192

  • CONCURRENCY=6, READ_BUFFER_SIZE_IN_BYTES=8192

  • CONCURRENCY=12, READ_BUFFER_SIZE_IN_BYTES=8192

  • CONCURRENCY=1, READ_BUFFER_SIZE_IN_BYTES=16384

  • CONCURRENCY=6, READ_BUFFER_SIZE_IN_BYTES=16384

  • CONCURRENCY=12, READ_BUFFER_SIZE_IN_BYTES=16384

Makefile

  • Run program

    make run
    
  • Run tests

    make test
    
  • Run benchmark

    make benchmark
    

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
package customerimporter reads from the given customers.csv file and returns a sorted (data structure of your choice) of email domains along with the number of customers with e-mail addresses for each domain.
package customerimporter reads from the given customers.csv file and returns a sorted (data structure of your choice) of email domains along with the number of customers with e-mail addresses for each domain.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL