awk

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 17, 2020 License: BSD-3-Clause Imports: 9 Imported by: 3

README

awk

Go Report Card Build Status Go project version GoDoc

Description

awk is a package for the Go programming language that provides an AWK-style text processing capability. The package facilitates splitting an input stream into records (default: newline-separated lines) and fields (default: whitespace-separated columns) then applying a sequence of statements of the form "if 〈pattern〉 then 〈action〉" to each record in turn. For example, the following is a complete Go program that adds up the first two columns of a CSV file to produce a third column:

package main

import (
    "github.com/spakin/awk"
    "os"
)

func main() {
    s := awk.NewScript()
    s.Begin = func(s *awk.Script) {
        s.SetFS(",")
        s.SetOFS(",")
    }
    s.AppendStmt(nil, func(s *awk.Script) {
        s.SetF(3, s.NewValue(s.F(1).Int()+s.F(2).Int()))
        s.Println()
    })
    s.Run(os.Stdin)
}

In the above, the awk package handles all the mundane details such as reading lines from the file, checking for EOF, splitting lines into columns, handling errors, and other such things. With the help of awk, Go easily can be applied to the sorts of text-processing tasks that one would normally implement in a scripting language but without sacrificing Go's speed, safety, or flexibility.

Installation

The awk package has opted into the Go module system so installation is in fact unnecessary if your program or package has done likewise. Otherwise, a traditional

go get github.com/spakin/awk

will install the package.

Documentation

Descriptions and examples of the awk API can be found online in the GoDoc documentation of package awk.

Author

Scott Pakin, scott+awk@pakin.org

Documentation

Overview

Package awk implements AWK-style processing of input streams.

Introduction

The awk package can be considered a shallow EDSL (embedded domain-specific language) for Go that facilitates text processing. It aims to implement the core semantics provided by AWK, a pattern scanning and processing language defined as part of the POSIX 1003.1 standard (http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html) and therefore part of all standard Linux/Unix distributions.

AWK's forte is simple transformations of tabular data. For example, the following is a complete AWK program that reads an entire file from the standard input device, splits each file into whitespace-separated columns, and outputs all lines in which the fifth column is an odd number:

$5 % 2 == 1

Here's a typical Go analogue of that one-line AWK program:

package main

import (
        "bufio"
        "fmt"
        "io"
        "os"
        "strconv"
        "strings"
)

func main() {
        input := bufio.NewReader(os.Stdin)
        for {
                line, err := input.ReadString('\n')
                if err != nil {
                        if err != io.EOF {
                                panic(err)
                        }
                        break
                }
                scanner := bufio.NewScanner(strings.NewReader(line))
                scanner.Split(bufio.ScanWords)
                cols := make([]string, 0, 10)
                for scanner.Scan() {
                        cols = append(cols, scanner.Text())
                }
                if err := scanner.Err(); err != nil {
                        panic(err)
                }
                if len(cols) < 5 {
                        continue
                }
                num, err := strconv.Atoi(cols[4])
                if num%2 == 1 {
                        fmt.Print(line)
                }
        }
}

The goal of the awk package is to emulate AWK's simplicity while simultaneously taking advantage of Go's speed, safety, and flexibility. With the awk package, the preceding code reduces to the following:

    package main

    import (
	    "github.com/spakin/awk"
	    "os"
    )

    func main() {
	    s := awk.NewScript()
	    s.AppendStmt(func(s *awk.Script) bool { return s.F(5).Int()%2 == 1 }, nil)
	    if err := s.Run(os.Stdin); err != nil {
		    panic(err)
	    }
    }

While not a one-liner like the original AWK program, the above is conceptually close to it. The AppendStmt method defines a script in terms of patterns and actions exactly as in the AWK program. The Run method then runs the script on an input stream, which can be any io.Reader.

Usage

For those programmers unfamiliar with AWK, an AWK program consists of a sequence of pattern/action pairs. Each pattern that matches a given line causes the corresponding action to be performed. AWK programs tend to be terse because AWK implicitly reads the input file, splits it into records (default: newline-terminated lines), and splits each record into fields (default: whitespace-separated columns), saving the programmer from having to express such operations explicitly. Furthermore, AWK provides a default pattern, which matches every record, and a default action, which outputs a record unmodified.

The awk package attempts to mimic those semantics in Go. Basic usage consists of three steps:

1. Script allocation (awk.NewScript)

2. Script definition (Script.AppendStmt)

3. Script execution (Script.Run)

In Step 2, AppendStmt is called once for each pattern/action pair that is to be appended to the script. The same script can be applied to multiple input streams by re-executing Step 3. Actions to be executed on every run of Step 3 can be supplied by assigning the script's Begin and End fields. The Begin action is typically used to initialize script state by calling methods such as SetRS and SetFS and assigning user-defined data to the script's State field (what would be global variables in AWK). The End action is typically used to store or report final results.

To mimic AWK's dynamic type system. the awk package provides the Value and ValueArray types. Value represents a scalar that can be coerced without error to a string, an int, or a float64. ValueArray represents a—possibly multidimensional—associative array of Values.

Both patterns and actions can access the current record's fields via the script's F method, which takes a 1-based index and returns the corresponding field as a Value. An index of 0 returns the entire record as a Value.

Features

The following AWK features and GNU AWK extensions are currently supported by the awk package:

• the basic pattern/action structure of an AWK script, including BEGIN and END rules and range patterns

• control over record separation (RS), including regular expressions and null strings (implying blank lines as separators)

• control over field separation (FS), including regular expressions and null strings (implying single-character fields)

• fixed-width fields (FIELDWIDTHS)

• fields defined by a regular expression (FPAT)

• control over case-sensitive vs. case-insensitive comparisons (IGNORECASE)

• control over the number conversion format (CONVFMT)

• automatic enumeration of records (NR) and fields (NR)

• "weak typing"

• multidimensional associative arrays

• premature termination of record processing (next) and script processing (exit)

• explicit record reading (getline) from either the current stream or a specified stream

• maintenance of regular-expression status variables (RT, RSTART, and RLENGTH)

For more information about AWK and its features, see the awk(1) manual page on any Linux/Unix system (available online from, e.g., http://linux.die.net/man/1/awk) or read the book, "The AWK Programming Language" by Aho, Kernighan, and Weinberger.

Examples

A number of examples ported from the POSIX 1003.1 standard document (http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html) are presented below.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func RunPipeline

func RunPipeline(r io.Reader, ss ...*Script) error

RunPipeline chains together a set of scripts into a pipeline, with each script sending its output to the next. (Implication: Script.Output will be overwritten in all but the last script.) If any script in the pipeline fails, a non-nil error will be returned.

Types

type ActionFunc

type ActionFunc func(*Script)

An ActionFunc represents an action to perform when the corresponding PatternFunc returns true.

type PatternFunc

type PatternFunc func(*Script) bool

A PatternFunc represents a pattern to match against. It is expected to examine the state of the given Script then return either true or false. If it returns true, the corresponding ActionFunc is executed. Otherwise, the corresponding ActionFunc is not executed.

func Auto

func Auto(v ...interface{}) PatternFunc

Auto provides a simplified mechanism for creating various common-case PatternFunc functions. It accepts zero, one, or an even number of arguments. If given no arguments, it matches every record. If given a single argument, its behavior depends on that argument's type:

• A Script.PatternFunc is returned as is.

• A *regexp.Regexp returns a function that matches that regular expression against the entire record.

• A string is treated as a regular expression and behaves likewise.

• An int returns a function that matches that int against NR.

• Any other type causes a run-time panic.

If given an even number of arguments, pairs of arguments are treated as ranges (cf. the Range function). The PatternFunc returns true if the record lies within any of the ranges.

Example (Int)

Delete the fifth line of the input stream but output all other lines.

package main

import (
	"github.com/spakin/awk"
	"os"
)

func main() {
	s := awk.NewScript()
	s.AppendStmt(awk.Auto(5), func(s *awk.Script) { s.Next() })
	s.AppendStmt(nil, nil)
	s.Run(os.Stdin)
}
Output:

Example (String)

Output only those lines containing the string, "fnord".

package main

import (
	"github.com/spakin/awk"
	"os"
)

func main() {
	s := awk.NewScript()
	s.AppendStmt(awk.Auto("fnord"), nil)
	s.Run(os.Stdin)
}
Output:

func Range

func Range(p1, p2 PatternFunc) PatternFunc

Range combines two patterns into a single pattern that statefully returns true between the time the first and second pattern become true (both inclusively).

Example

Output all input lines that appear between "BEGIN" and "END" inclusive.

package main

import (
	"github.com/spakin/awk"
	"os"
)

func main() {
	s := awk.NewScript()
	s.AppendStmt(awk.Range(func(s *awk.Script) bool { return s.F(1).StrEqual("BEGIN") },
		func(s *awk.Script) bool { return s.F(1).StrEqual("END") }),
		nil)
	s.Run(os.Stdin)
}
Output:

type Script

type Script struct {
	State         interface{} // Arbitrary, user-supplied data
	Output        io.Writer   // Output stream (defaults to os.Stdout)
	Begin         ActionFunc  // Action to perform before any input is read
	End           ActionFunc  // Action to perform after all input is read
	ConvFmt       string      // Conversion format for numbers, "%.6g" by default
	SubSep        string      // Separator for simulated multidimensional arrays
	NR            int         // Number of input records seen so far
	NF            int         // Number of fields in the current input record
	RT            string      // Actual string terminating the current record
	RStart        int         // 1-based index of the previous regexp match (Value.Match)
	RLength       int         // Length of the previous regexp match (Value.Match)
	MaxRecordSize int         // Maximum number of characters allowed in each record
	MaxFieldSize  int         // Maximum number of characters allowed in each field
	// contains filtered or unexported fields
}

A Script encapsulates all of the internal state for an AWK-like script.

func NewScript

func NewScript() *Script

NewScript initializes a new Script with default values.

func (*Script) AppendStmt

func (s *Script) AppendStmt(p PatternFunc, a ActionFunc)

AppendStmt appends a pattern-action pair to a Script. If the pattern function is nil, the action will be performed on every record. If the action function is nil, the record will be output verbatim to the standard output device. It is invalid to call AppendStmt from a running script.

Example

For all rows of the form "Total: <number>", accumulate <number>. Once all rows have been read, output the grand total.

package main

import (
	"fmt"
	"github.com/spakin/awk"
	"os"
)

func main() {
	s := awk.NewScript()
	s.State = 0.0
	s.AppendStmt(func(s *awk.Script) bool { return s.NF == 2 && s.F(1).StrEqual("Total:") },
		func(s *awk.Script) { s.State = s.State.(float64) + s.F(2).Float64() })
	s.End = func(s *awk.Script) { fmt.Printf("The grand total is %.2f\n", s.State.(float64)) }
	s.Run(os.Stdin)
}
Output:

Example (NilAction)

Output only rows in which the first column contains a larger number than the second column.

package main

import (
	"github.com/spakin/awk"
	"os"
)

func main() {
	s := awk.NewScript()
	s.AppendStmt(func(s *awk.Script) bool { return s.F(1).Int() > s.F(2).Int() }, nil)
	s.Run(os.Stdin)
}
Output:

Example (NilPattern)

Output each line preceded by its line number.

package main

import (
	"fmt"
	"github.com/spakin/awk"
	"os"
)

func main() {
	s := awk.NewScript()
	s.AppendStmt(nil, func(s *awk.Script) { fmt.Printf("%4d %v\n", s.NR, s.F(0)) })
	s.Run(os.Stdin)
}
Output:

func (*Script) Copy

func (s *Script) Copy() *Script

Copy returns a copy of a Script.

func (*Script) Exit

func (s *Script) Exit()

Exit stops processing the entire script, causing the Run method to return.

func (*Script) F

func (s *Script) F(i int) *Value

F returns a specified field of the current record. Field numbers are 1-based. Field 0 refers to the entire record. Requesting a field greater than NF returns a zero value. Requesting a negative field number panics with an out-of-bounds error.

Example

Output each line with its columns in reverse order.

package main

import (
	"fmt"
	"github.com/spakin/awk"
	"os"
)

func main() {
	s := awk.NewScript()
	s.AppendStmt(nil, func(s *awk.Script) {
		for i := s.NF; i > 0; i-- {
			if i > 1 {
				fmt.Printf("%v ", s.F(i))
			} else {
				fmt.Printf("%v\n", s.F(i))
			}
		}
	})
	s.Run(os.Stdin)
}
Output:

func (*Script) FFloat64s

func (s *Script) FFloat64s() []float64

FFloat64s returns all fields in the current record as a []float64 of length NF.

Example

Sort each line's columns, which are assumed to be floating-point numbers.

package main

import (
	"fmt"
	"github.com/spakin/awk"
	"os"
	"sort"
)

func main() {
	s := awk.NewScript()
	s.AppendStmt(nil, func(s *awk.Script) {
		nums := s.FFloat64s()
		sort.Float64s(nums)
		for _, n := range nums[:len(nums)-1] {
			fmt.Printf("%.5g ", n)
		}
		fmt.Printf("%.5g\n", nums[len(nums)-1])
	})
	s.Run(os.Stdin)
}
Output:

func (*Script) FInts

func (s *Script) FInts() []int

FInts returns all fields in the current record as a []int of length NF.

func (*Script) FStrings

func (s *Script) FStrings() []string

FStrings returns all fields in the current record as a []string of length NF.

func (*Script) GetLine

func (s *Script) GetLine(r io.Reader) (*Value, error)

GetLine reads the next record from an input stream and returns it. If the argument to GetLine is nil, GetLine reads from the current input stream and increments NR. Otherwise, it reads from the given io.Reader and does not increment NR. Call SetF(0, ...) on the Value returned by GetLine to perform the equivalent of AWK's getline with no variable argument.

func (*Script) IgnoreCase

func (s *Script) IgnoreCase(ign bool)

IgnoreCase specifies whether regular-expression and string comparisons should be performed in a case-insensitive manner.

func (*Script) NewValue

func (s *Script) NewValue(v interface{}) *Value

NewValue creates a Value from an arbitrary Go data type. Data types that do not map straightforwardly to one of {int, float64, string} are represented by a zero value.

func (*Script) NewValueArray

func (s *Script) NewValueArray() *ValueArray

NewValueArray creates and returns an associative array of Values.

func (*Script) Next

func (s *Script) Next()

Next stops processing the current record and proceeds with the next record.

func (*Script) Println

func (s *Script) Println(args ...interface{})

Println is like fmt.Println but honors the current output stream, output field separator, and output record separator. If called with no arguments, Println outputs all fields in the current record.

func (*Script) Run

func (s *Script) Run(r io.Reader) (err error)

Run executes a script against a given input stream. It is perfectly valid to run the same script on multiple input streams.

func (*Script) SetF

func (s *Script) SetF(i int, v *Value)

SetF sets a field of the current record to the given Value. Field numbers are 1-based. Field 0 refers to the entire record. Setting it causes the entire line to be reparsed (and NF recomputed). Setting a field numbered larger than NF extends NF to that value. Setting a negative field number panics with an out-of-bounds error.

func (*Script) SetFPat

func (s *Script) SetFPat(fp string)

SetFPat defines a "field pattern", a regular expression that matches fields. This lies in contrast to providing a regular expression to SetFS, which matches the separation between fields, not the fields themselves.

func (*Script) SetFS

func (s *Script) SetFS(fs string)

SetFS sets the input field separator. As in AWK, if the field separator is a single space (the default), fields are separated by runs of whitespace; if the field separator is any other single character, that character is used to separate fields; if the field separator is an empty string, each individual character becomes a separate field; and if the field separator is multiple characters, it's treated as a regular expression (subject to the current setting of Script.IgnoreCase).

func (*Script) SetFieldWidths

func (s *Script) SetFieldWidths(fw []int)

SetFieldWidths indicates that each record is composed of fixed-width columns and specifies the width in characters of each column. It is invalid to pass SetFieldWidths a nil argument or a non-positive field width.

func (*Script) SetOFS

func (s *Script) SetOFS(ofs string)

SetOFS sets the output field separator.

func (*Script) SetORS

func (s *Script) SetORS(ors string)

SetORS sets the output record separator.

func (*Script) SetRS

func (s *Script) SetRS(rs string)

SetRS sets the input record separator (really, a record terminator). It is invalid to call SetRS after the first record is read. (It is acceptable to call SetRS from a Begin action, though.) As in AWK, if the record separator is a single character, that character is used to separate records; if the record separator is multiple characters, it's treated as a regular expression (subject to the current setting of Script.IgnoreCase); and if the record separator is an empty string, records are separated by blank lines. That last case implicitly causes newlines to be accepted as a field separator in addition to whatever was specified by SetFS.

type Value

type Value struct {
	// contains filtered or unexported fields
}

A Value represents an immutable datum that can be converted to an int, float64, or string in best-effort fashion (i.e., never returning an error).

func (*Value) Float64

func (v *Value) Float64() float64

Float64 converts a Value to a float64.

func (*Value) Int

func (v *Value) Int() int

Int converts a Value to an int.

func (*Value) Match

func (v *Value) Match(expr string) bool

Match says whether a given regular expression, provided as a string, matches the Value. If the associated script set IgnoreCase(true), the match is tested in a case-insensitive manner.

func (*Value) StrEqual

func (v *Value) StrEqual(v2 interface{}) bool

StrEqual says whether a Value, treated as a string, has the same contents as a given Value, which can be provided either as a Value or as any type that can be converted to a Value. If the associated script called IgnoreCase(true), the comparison is performed in a case-insensitive manner.

func (*Value) String

func (v *Value) String() string

String converts a Value to a string.

type ValueArray

type ValueArray struct {
	// contains filtered or unexported fields
}

A ValueArray maps Values to Values.

func (*ValueArray) Delete

func (va *ValueArray) Delete(args ...interface{})

Delete deletes a key and associated value from a ValueArray. Multiple indexes can be specified to simulate multidimensional arrays. (In fact, the indexes are concatenated into a single string with intervening Script.SubSep characters.) The arguments can be provided either as Values or as any types that can be converted to Values. If no argument is provided, the entire ValueArray is emptied.

func (*ValueArray) Get

func (va *ValueArray) Get(args ...interface{}) *Value

Get returns the Value associated with a given index into a ValueArray. Multiple indexes can be specified to simulate multidimensional arrays. (In fact, the indexes are concatenated into a single string with intervening Script.SubSep characters.) The arguments can be provided either as Values or as any types that can be converted to Values. If the index doesn't appear in the array, a zero value is returned.

func (*ValueArray) Keys

func (va *ValueArray) Keys() []*Value

Keys returns all keys in the associative array in undefined order.

func (*ValueArray) Set

func (va *ValueArray) Set(args ...interface{})

Set (index, value) assigns a Value to an index of a ValueArray. Multiple indexes can be specified to simulate multidimensional arrays. (In fact, the indexes are concatenated into a single string with intervening Script.SubSep characters.) The final argument is always the value to assign. Arguments can be provided either as Values or as any types that can be converted to Values.

Example

Allocate and populate a 2-D array. The diagonal is made up of strings while the rest of the array consists of float64 values.

package main

import (
	"github.com/spakin/awk"
)

var s *awk.Script

func main() {
	va := s.NewValueArray()
	diag := []string{"Dasher", "Dancer", "Prancer", "Vixen", "Comet", "Cupid", "Dunder", "Blixem"}
	for i := 0; i < 8; i++ {
		for j := 0; j < 8; j++ {
			if i == j {
				va.Set(i, j, diag[i])
			} else {
				va.Set(i, j, float64(i*8+j)/63.0)
			}
		}
	}
}
Output:

func (*ValueArray) Values

func (va *ValueArray) Values() []*Value

Values returns all values in the associative array in undefined order.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL