gonx

package module
v1.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 18, 2021 License: MIT Imports: 9 Imported by: 0

README

gonx Build Status Go Report Card

gonx is Nginx access log reader library for Go. In fact you can use it for any format.

Usage

The library provides Reader type and two constructors for it.

Common constructor NewReader gets opened file (any io.Reader in fact) and log format of type string as argumets. Format is in form os nginx log_format string.

reader := gonx.NewReader(file, format)

NewNginxReader provides more magic. It gets log file io.Reader, nginx config file io.Reader and log_format name string as a third. The actual format for Parser will be extracted from given nginx config.

reader := gonx.NewNginxReader(file, nginxConfig, format_name)

Reader implements io.Reader. Here is example usage

for {
	rec, err := reader.Read()
	if err == io.EOF {
		break
	}
	// Process the record... e.g.
}

See more examples in example/*.go sources.

Performance

NOTE All benchmarks was made on my old 11" MacBook Air 2011, so you should get the better results for your brand new hardware ;-)

I have a few benchmarks for parsing string log record into Entry using gonx.Parser

BenchmarkParseSimpleLogRecord      100000            19457 ns/op
BenchmarkParseLogRecord             20000            84425 ns/op

And here is some real wold stats. I got ~300Mb log file with ~700K records and process with simple scripts.

  • Reading whole file line by line with bufio.Scanner without any other processing takes a one second.
  • Read in the same manner plus parsing with gonx.Parser takes about 80 seconds
  • But for reading this file with gonx.Reader which parses records using separate goroutines it takes about 45 seconds (but I want to make it faster)

Format

As I said above this library is primary for nginx access log parsing, but it can be configured to parse any other format. NewReader accepts format argument, it will be transformed to regular expression and used for log line by line parsing. Format is nginx-like, here is example

`$remote_addr [$time_local] "$request"`

It should contain variables in form $name. The regular expression will be created using this string format representation

`^(?P<remote_addr>[^ ]+) \[(?P<time_local>[^]]+)\] "(?P<request>[^"]+)"$`

Reader.Read returns a record of type Entry (which is customized map[string][string]). For this example the returned record map will contain remote_addr, time_local and request keys filled with parsed values.

Stability

This library API and internal representation can be changed at any moment, but I guarantee that backward capability will be supported for the following public interfaces.

  • func NewReader(logFile io.Reader, format string) *Reader
  • func NewNginxReader(logFile io.Reader, nginxConf io.Reader, formatName string) (reader *Reader, err error)
  • func (r *Reader) Read() (record Entry, err error)

Changelog

All major changes will be noticed in release notes.

Roadmap

I have no roadmap for this project at the moment for a few reasons. At the first, it is a simple library and I want to keep it. At the second, there is no feature requests, and for me, this library do its job. A few things may happen: the default binary, to use this not only as a library, and performance improvements if they will be needed.

Contributing

Fork the repo, create a feature branch then send me pull request. Feel free to create new issues or contact me using email.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func MapReduce added in v1.1.0

func MapReduce(file io.Reader, parser StringParser, reducer Reducer) chan *Entry

MapReduce iterates over given file and map each it's line into Entry record using parser and apply reducer to the Entries channel. Execution terminates when result will be readed from reducer's output channel, but the mapper works and fills input Entries channel until all lines will be read from the fiven file.

Types

type Avg added in v1.2.0

type Avg struct {
	Fields []string
}

Avg implements the Reducer interface for average entries values calculation

func (*Avg) Reduce added in v1.2.0

func (r *Avg) Reduce(input chan *Entry, output chan *Entry)

Reduce calculates the average value for input channel Entries, using configured Fields of the struct. Write result to the output channel as map[string]float64

type Chain added in v1.2.0

type Chain struct {
	// contains filtered or unexported fields
}

Chain implements the Reducer interface for chaining other reducers

func NewChain added in v1.2.0

func NewChain(reducers ...Reducer) *Chain

NewChain creates a new chain of Reducers

func (*Chain) Reduce added in v1.2.0

func (r *Chain) Reduce(input chan *Entry, output chan *Entry)

Reduce applies a chain of reducers to the input channel of entries and merge results

type Count added in v1.2.0

type Count struct {
}

Count implements the Reducer interface to count entries

func (*Count) Reduce added in v1.2.0

func (r *Count) Reduce(input chan *Entry, output chan *Entry)

Reduce simply counts entries and write a sum to the output channel

type Datetime added in v1.3.0

type Datetime struct {
	Field  string
	Format string
	Start  time.Time
	End    time.Time
}

Datetime implements the Filter interface to filter Entries with timestamp fields within the specified datetime interval.

func (*Datetime) Filter added in v1.3.0

func (i *Datetime) Filter(entry *Entry) (validEntry *Entry)

Filter checks a field value to be in desired datetime range.

func (*Datetime) Reduce added in v1.3.0

func (i *Datetime) Reduce(input chan *Entry, output chan *Entry)

Reduce implements the Reducer interface. Go through input and apply Filter.

type Entry

type Entry struct {
	// contains filtered or unexported fields
}

Entry is a parsed log record. Use Get method to retrieve a value by name instead of threating this as a map, because inner representation is in design.

func NewEmptyEntry added in v1.2.0

func NewEmptyEntry() *Entry

NewEmptyEntry creates an empty Entry to be filled later

func NewEntry added in v1.2.0

func NewEntry(fields Fields) *Entry

NewEntry creates an Entry with fiven fields

func (*Entry) Field added in v1.2.0

func (entry *Entry) Field(name string) (value string, err error)

Field returns an entry field value by name or empty string and error if it does not exist.

func (*Entry) Fields added in v1.3.1

func (entry *Entry) Fields() Fields

Fields returns all fields of an entry

func (*Entry) FieldsHash added in v1.2.0

func (entry *Entry) FieldsHash(fields []string) string

FieldsHash returns a hash of all fields

func (*Entry) FloatField added in v1.2.0

func (entry *Entry) FloatField(name string) (value float64, err error)

FloatField returns an entry field value as float64. Return nil if field does not exist and conversion error if cannot cast a type.

func (*Entry) GetFields added in v1.3.1

func (entry *Entry) GetFields() Fields

func (*Entry) Merge added in v1.2.0

func (entry *Entry) Merge(merge *Entry)

Merge two entries by updating values for master entry with given.

func (*Entry) Partial added in v1.2.0

func (entry *Entry) Partial(fields []string) *Entry

Partial returns a partial field entry with the specified fields

func (*Entry) SetField added in v1.2.0

func (entry *Entry) SetField(name string, value string)

SetField sets the value of a field

func (*Entry) SetFloatField added in v1.2.0

func (entry *Entry) SetFloatField(name string, value float64)

SetFloatField is a Float field value setter. It accepts float64, but still store it as a string in the same fields map. The precision is 2, its enough for log parsing task

func (*Entry) SetUintField added in v1.2.0

func (entry *Entry) SetUintField(name string, value uint64)

SetUintField is a Integer field value setter. It accepts float64, but still store it as a string in the same fields map.

type Fields added in v1.2.0

type Fields map[string]string

Fields is a shortcut for the map of strings

type Filter added in v1.3.0

type Filter interface {
	Reducer
	Filter(*Entry) *Entry
}

Filter interface for Entries channel limiting.

Filter method should accept *Entry and return *Entry if it meets filter condition, otherwise it returns nil.

type GroupBy added in v1.2.0

type GroupBy struct {
	Fields []string
	// contains filtered or unexported fields
}

GroupBy implements the Reducer interface to apply other reducers and get data grouped by given fields.

func NewGroupBy added in v1.2.0

func NewGroupBy(fields []string, reducers ...Reducer) *GroupBy

NewGroupBy creates a new GroupBy Reducer

func (*GroupBy) Reduce added in v1.2.0

func (r *GroupBy) Reduce(input chan *Entry, output chan *Entry)

Reduce applies related reducers and group data by Fields.

type Parser added in v1.1.0

type Parser struct {
	// contains filtered or unexported fields
}

Parser is a log record parser. Use specific constructors to initialize it.

func NewNginxParser added in v1.1.0

func NewNginxParser(conf io.Reader, name string) (parser *Parser, err error)

NewNginxParser parses the nginx conf file to find log_format with the given name and returns a parser for this format. It returns an error if cannot find the given log format.

func NewParser added in v1.1.0

func NewParser(format string) *Parser

NewParser returns a new Parser, use given log format to create its internal strings parsing regexp.

func (*Parser) ParseString added in v1.1.0

func (parser *Parser) ParseString(line string) (entry *Entry, err error)

ParseString parses a log file line using internal format regexp. If a line does not match the given format an error will be returned.

type ReadAll added in v1.1.0

type ReadAll struct {
}

ReadAll implements the Reducer interface for simple input entries redirected to the output channel.

func (*ReadAll) Reduce added in v1.1.0

func (r *ReadAll) Reduce(input chan *Entry, output chan *Entry)

Reduce redirects input Entries channel directly to the output without any modifications. It is useful when you want jast to read file fast using asynchronous with mapper routines.

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

Reader is a log file reader. Use specific constructors to create it.

func NewNginxReader

func NewNginxReader(logFile io.Reader, nginxConf io.Reader, formatName string) (reader *Reader, err error)

NewNginxReader creates a reader for the nginx log format. Nginx config parser will be used to get particular format from the conf file.

func NewParserReader added in v1.3.1

func NewParserReader(logFile io.Reader, parser StringParser) *Reader

NewParserReader creates a reader with the given parser

func NewReader

func NewReader(logFile io.Reader, format string) *Reader

NewReader creates a reader for a custom log format.

func (*Reader) Read

func (r *Reader) Read() (entry *Entry, err error)

Read next parsed Entry from the log file. Return EOF if there are no Entries to read.

type Reducer added in v1.1.0

type Reducer interface {
	Reduce(input chan *Entry, output chan *Entry)
}

Reducer interface for Entries channel redure.

Each Reduce method should accept input channel of Entries, do it's job and the result should be written to the output channel.

It does not return values because usually it runs in a separate goroutine and it is handy to use channel for reduced data retrieval.

type StringParser added in v1.3.0

type StringParser interface {
	ParseString(line string) (entry *Entry, err error)
}

StringParser is the interface that wraps the ParseString method.

type Sum added in v1.2.0

type Sum struct {
	Fields []string
}

Sum implements the Reducer interface for summarize Entry values for the given fields

func (*Sum) Reduce added in v1.2.0

func (r *Sum) Reduce(input chan *Entry, output chan *Entry)

Reduce summarizes given Entry fields and return a map with result for each field.

Directories

Path Synopsis
Example program that reads big nginx file from stdin line by line and measure reading time.
Example program that reads big nginx file from stdin line by line and measure reading time.
example

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL