grok

package module
v0.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 8, 2024 License: Apache-2.0 Imports: 5 Imported by: 11

README

Grok

Grok is grok parsing library based on re2 regexp.

Usage

Basic usage:
g := grok.New()

// use custom patterns
patternDefinitions := map[string]string{
    // patterns can be nested
    "NGINX_HOST":         `(?:%{IP:destination.ip}|%{NGINX_NOTSEPARATOR:destination.domain})(:%{NUMBER:destination.port})?`,
    // NGINX_NOTSEPARATOR is used in NGINX_HOST. IP and NUMBER are part of default pattern set
    "NGINX_NOTSEPARATOR": `"[^\t ,:]+"`,
}
g.AddPatterns(patternDefinitions)

// compile grok before use, this will generate regex.Regex based on pattern and 
// subpatterns provided.
// this needs to be performed just once.
err := g.Compile("%{NGINX_HOST}", true)

res, err := g.ParseString("127.0.0.1:1234")

results in:

map[string]string {
    "destination.ip": "127.0.0.1", 
    "destination.port": "1234", 
}
Unnamed usage:

In this case we changed err := g.Compile("%{NGINX_HOST}", false) to err := g.Compile("%{NGINX_HOST}", true) allowing unnamed return matches. In case of unnamed match, definition name is used.

g := grok.New()

// use custom patterns
patternDefinitions := map[string]string{
    // patterns can be nested
    "NGINX_HOST":         `(?:%{IP:destination.ip}|%{NGINX_NOTSEPARATOR:destination.domain})(:%{NUMBER:destination.port})?`,
    // NGINX_NOTSEPARATOR is used in NGINX_HOST. IP and NUMBER are part of default pattern set
    "NGINX_NOTSEPARATOR": `"[^\t ,:]+"`,
}
g.AddPatterns(patternDefinitions)

// compile grok before use, this will generate regex.Regex based on pattern and 
// subpatterns provided
err := g.Compile("%{NGINX_HOST}", false)

res, err := g.ParseString("127.0.0.1:1234")

results in:

map[string]string {
    "NGINX_HOST": "127.0.0.1:1234", 
    "destination.ip": "127.0.0.1", 
    "IPV4": "127.0.0.1", 
    "destination.port": "1234", 
    "BASE10NUM": "1234", 
}
Typed arguments usage:

In this case we're marking destination.port as int using definition %{NUMBER:destination.port:int}.

g := grok.New()

// use custom patterns
patternDefinitions := map[string]string{
    "NGINX_HOST":         `(?:%{IP:destination.ip}|%{NGINX_NOTSEPARATOR:destination.domain})(:%{NUMBER:destination.port:int})?`,
    "NGINX_NOTSEPARATOR": `"[^\t ,:]+"`,
}
g.AddPatterns(patternDefinitions)

// compile grok before use, this will generate regex.Regex based on pattern and 
// subpatterns provided
err := g.Compile("%{NGINX_HOST}", true)

res, err := g.ParseTypedString("127.0.0.1:1234")

See type changed from map[string]string to map[string]interface{} and destination.port is now a number:

map[string]interface {} {
    "destination.ip": "127.0.0.1", 
    "destination.port": 1234, 
}

Benchmarks

Comparing to github.com/vjeantet/grok and more optimized version based on previous one github.com/trivago/grok

BenchmarkParseString-10                 	   15466	     76811 ns/op	    4578 B/op	       5 allocs/op
BenchmarkParseStringRegexp-10           	   15351	     77109 ns/op	    3840 B/op	       3 allocs/op
BenchmarkParseStringTrivago-10          	   15868	     76416 ns/op	    4593 B/op	       5 allocs/op
BenchmarkParseStringVjeanet-10          	   15548	     77111 ns/op	    5897 B/op	       6 allocs/op

BenchmarkNestedParseString-10           	   42201	     28908 ns/op	    3463 B/op	       4 allocs/op
BenchmarkNestedParseStringTrivago-10    	   41937	     28836 ns/op	    3449 B/op	       4 allocs/op
BenchmarkNestedParseStringVjeanet-10    	   41080	     29174 ns/op	    4045 B/op	       5 allocs/op

BenchmarkTypedParseString-10            	   39934	     29707 ns/op	    3851 B/op	       9 allocs/op
BenchmarkTypedParseStringTrivago-10     	   40146	     29238 ns/op	    3475 B/op	       6 allocs/op
BenchmarkTypedParseStringVjeanet-10     	   39931	     30616 ns/op	    4196 B/op	      14 allocs/op

Default set of patterns

This library comes with a default set of patterns defined in patterns/default.go file. You can include more predefined patterns from patterns/*.go like so

g := grok.New()
g.AddPatterns(patterns.Rails) // to include whole set
g.AddPattern(patterns.Ruby["RUBY_LOGLEVEL"]) // to include specific one

Default set consists of:

Name Example
WORD "hello", "world123", "test_data"
NOTSPACE "example", "text-with-dashes", "12345"
SPACE " ", "\t", " "
INT "123", "-456", "+789"
NUMBER "123", "456.789", "-0.123"
BOOL "true", "false", "true"
BASE10NUM "123", "-123.456", "0.789"
BASE16NUM "1a2b", "0x1A2B", "-0x1a2b3c"
BASE16FLOAT "0x1.a2b3", "-0x1A2B3C.D"
POSINT "123", "456", "789"
NONNEGINT "0", "123", "456"
GREEDYDATA "anything goes", "literally anything", "123 #@!"
QUOTEDSTRING ""This is a quote"", "'single quoted'"
UUID "123e4567-e89b-12d3-a456-426614174000"
URN "urn:isbn:0451450523", "urn:ietf:rfc:2648"
Network patterns
Name Example
IP "192.168.1.1", "2001:0db8:85a3:0000:0000:8a2e:0370:7334"
IPV6 "2001:0db8:85a3:0000:0000:8a2e:0370:7334", "
IPV4 "192.168.1.1", "10.0.0.1", "172.16.254.1"
IPORHOST "example.com", "192.168.1.1", "fe80::1ff:fe23:4567:890a"
HOSTNAME "example.com", "sub.domain.co.uk", "localhost"
EMAILLOCALPART "john.doe", "alice123", "bob-smith"
EMAILADDRESS "john.doe@example.com", "alice123@domain.co.uk"
USERNAME "user1", "john.doe", "alice_123"
USER "user1", "john.doe", "alice_123"
MAC "00:1A:2B:3C:4D:5E", "001A.2B3C.4D5E"
CISCOMAC "001A.2B3C.4D5E", "001B.2C3D.4E5F", "001C.2D3E.4F5A"
WINDOWSMAC "00-1A-2B-3C-4D-5E", "00-1B-2C-3D-4E-5F"
COMMONMAC "00:1A:2B:3C:4D:5E", "00:1B:2C:3D:4E:5F"
HOSTPORT "example.com:80", "192.168.1.1:8080"
Paths patterns
Name Example
UNIXPATH "/home/user", "/var/log/syslog", "/tmp/abc_123"
TTY "/dev/pts/1", "/dev/tty0", "/dev/ttyS0"
WINPATH "C:\Program Files\App", "D:\Work\project\file.txt"
URIPROTO "http", "https", "ftp"
URIHOST "example.com", "192.168.1.1:8080"
URIPATH "/path/to/resource", "/another/path", "/root"
URIQUERY "key=value", "search=query&active=true"
URIPARAM "?key=value", "?search=query&active=true"
URIPATHPARAM "/path?query=1", "/folder/path?valid=true"
PATH "/home/user/documents", "C:\Windows\system32", "/var/log/syslog"
Datetime patterns
Name Example
MONTH "January", "Feb", "March", "Apr", "May", "Jun", "Jul", "August", "September", "October", "Nov", "December"
MONTHNUM "01", "02", "03", ... "11", "12"
DAY "Monday", "Tuesday", ... "Sunday"
YEAR "1999", "2000", "2021"
HOUR "00", "12", "23"
MINUTE "00", "30", "59"
SECOND "00", "30", "60"
TIME "14:30", "23:59:59", "12:00:00", "12:00:60"
DATE_US "04/21/2022", "12-25-2020", "07/04/1999"
DATE_EU "21.04.2022", "25/12/2020", "04-07-1999"
ISO8601_TIMEZONE "Z", "+02:00", "-05:00"
ISO8601_SECOND "59", "30", "60.123"
TIMESTAMP_ISO8601 "2022-04-21T14:30:00Z", "2020-12-25T23:59:59+02:00", "1999-07-04T12:00:00-05:00"
DATE "04/21/2022", "21.04.2022", "12-25-2020"
DATESTAMP "04/21/2022 14:30", "21.04.2022 23:59", "12-25-2020 12:00"
TZ "EST", "CET", "PDT"
DATESTAMP_RFC822 "Wed Jan 12 2024 14:33 EST"
DATESTAMP_RFC2822 "Tue, 12 Jan 2022 14:30 +0200", "Fri, 25 Dec 2020 23:59 -0500", "Sun, 04 Jul 1999 12:00 Z"
DATESTAMP_OTHER "Tue Jan 12 14:30 EST 2022", "Fri Dec 25 23:59 CET 2020", "Sun Jul 04 12:00 PDT 1999"
DATESTAMP_EVENTLOG "20220421143000", "20201225235959", "19990704120000"
Syslog patterns
Name Example
SYSLOGTIMESTAMP "Jan 1 00:00:00", "Mar 15 12:34:56", "Dec 31 23:59:59"
PROG "sshd", "kernel", "cron"
SYSLOGPROG "sshd[1234]", "kernel", "cron[5678]"
SYSLOGHOST "example.com", "192.168.1.1", "localhost"
SYSLOGFACILITY "<1.2>", "<12345.13456>"
HTTPDATE "25/Dec/2024:14:33 4"

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrParseFailure    = fmt.Errorf("parsing failed")
	ErrTypeNotProvided = fmt.Errorf("type not specified")
	ErrUnsupportedName = fmt.Errorf("name contains unsupported character ':'")
)

Functions

This section is empty.

Types

type Grok

type Grok struct {
	// contains filtered or unexported fields
}

func New

func New() *Grok

func NewComplete

func NewComplete(additionalPatterns ...map[string]string) (*Grok, error)

NewComplete creates a grok parser with full set of patterns

func NewWithPatterns

func NewWithPatterns(patterns ...map[string]string) (*Grok, error)

func NewWithoutDefaultPatterns

func NewWithoutDefaultPatterns() *Grok

func (*Grok) AddPattern

func (grok *Grok) AddPattern(name, patternDefinition string) error

func (*Grok) AddPatterns

func (grok *Grok) AddPatterns(patternDefinitions map[string]string) error

func (*Grok) Compile

func (grok *Grok) Compile(pattern string, namedCapturesOnly bool) error

func (*Grok) HasCaptureGroups

func (grok *Grok) HasCaptureGroups() bool

func (*Grok) Match

func (grok *Grok) Match(text []byte) bool

func (*Grok) MatchString

func (grok *Grok) MatchString(text string) bool

func (*Grok) Parse

func (grok *Grok) Parse(text []byte) (map[string][]byte, error)

Parse parses text in a form of []byte and returns map[string][]byte with values not converted to types according to hints. When expression is not a match nil map is returned.

func (*Grok) ParseString

func (grok *Grok) ParseString(text string) (map[string]string, error)

ParseString parses text in a form of string and returns map[string]string with values not converted to types according to hints. When expression is not a match nil map is returned.

func (*Grok) ParseTyped

func (grok *Grok) ParseTyped(text []byte) (map[string]interface{}, error)

ParseTyped parses text and returns map[string]interface{} with values typed according to type hints generated at compile time. If hint is not found error returned is TypeNotProvided. When expression is not a match nil map is returned.

func (*Grok) ParseTypedString

func (grok *Grok) ParseTypedString(text string) (map[string]interface{}, error)

ParseTypedString parses text and returns map[string]interface{} with values typed according to type hints generated at compile time. If hint is not found error returned is TypeNotProvided. When expression is not a match nil map is returned.

Directories

Path Synopsis
dev-tools

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL