gharchive

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 21, 2020 License: MIT Imports: 11 Imported by: 0

README

gharchive-client

godoc ci

A command line client and go package for iterating over events from gharchive.

Installation

Download binaries from the latest release

Command line usage

Usage: gharchive <start> [<end>]

Arguments:
  <start>    start time formatted as YYYY-MM-DD, or as an RFC3339 date
  [<end>]    end time formatted as YYYY-MM-DD, or as an RFC3339 date. default is an hour past start

Flags:
  -h, --help                     Show context-sensitive help.
      --type=TYPE,...            include only these event types
      --not-type=NOT-TYPE,...    exclude these event types
      --strict-created-at        only output events with a created_at between start and end
      --no-empty-lines           skip empty lines
      --only-valid-json          skip lines that aren not valid json objects
      --preserve-order           ensure that events are output in the same order they exist on data.gharchive.org
      --concurrency=INT          max number of concurrent downloads to run. Ignored if --preserve-order is set. Default is the number of cpus available.
      --debug                    output debug logs

Performance

I can iterate about 200k events per second from an 8 core MacBook Pro with a cable modem. On an 80 core server in a data center that increases to about 450k.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type JSONFieldValidator

type JSONFieldValidator struct {
	Field     string
	Validator JSONValueValidator
}

JSONFieldValidator validates the value of a json field

type JSONValueValidator

type JSONValueValidator func(val interface{}) bool

JSONValueValidator validates a json value

func StringValueValidator

func StringValueValidator(validate func(val string) bool) JSONValueValidator

StringValueValidator validates a string value

func TimeValueValidator

func TimeValueValidator(validate func(val time.Time) bool) JSONValueValidator

TimeValueValidator validates a time value

type Options

type Options struct {
	Validators    []Validator     // list of validators to check each line
	SingleHour    bool            // ignore end time and just scan the file containing the hour in which start occurs.
	EndTime       time.Time       // end of the timespan to scan. events up to the second before EndTime will be scanned. ignored when SingleHour is set. default: start time + 1 hour
	PreserveOrder bool            // run a single process so that the output order is preserved
	Concurrency   int             // number of concurrent downloads to run. ignored when PreserveOrder is set. default: 1
	Bucket        string          // the GCP bucket for gharchive. default: data.gharchive.org
	StorageClient *storage.Client // a client to use instead of the default.
}

Options are options for a Scanner

type Scanner

type Scanner struct {
	// contains filtered or unexported fields
}

Scanner scans lines from gharchive

func New

func New(ctx context.Context, startTime time.Time, opts *Options) (*Scanner, error)

New returns a new Scanner

func (*Scanner) Bytes added in v0.1.0

func (s *Scanner) Bytes() []byte

Bytes returns the most recent token generated by a call to Scan. The underlying array may point to data that will be overwritten by a subsequent call to Scan.

func (*Scanner) Close

func (s *Scanner) Close() error

Close closes the scanner.

func (*Scanner) Err added in v0.1.0

func (s *Scanner) Err() error

Err returns the first non-EOF error that was encountered by the Scanner.

func (*Scanner) Scan added in v0.1.0

func (s *Scanner) Scan(ctx context.Context) bool

Scan advances the scanner to the next token, which will then be available through the Bytes method. It returns false when the scan stops by reaching the end of the output. After Scan returns false, the Err method will return any error that occurred during scanning.

type Validator

type Validator func(line []byte) bool

Validator is a function that returns true when a line passes validation

func ValidateIsJSONObject

func ValidateIsJSONObject() Validator

ValidateIsJSONObject returns true if the first non-whitespace byte is '{'

func ValidateJSONFields

func ValidateJSONFields(validators []JSONFieldValidator) Validator

ValidateJSONFields uses the given validators to validate json field

func ValidateNotEmpty

func ValidateNotEmpty() Validator

ValidateNotEmpty validate that line contains at least one non-whitespace character

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL