strum

package module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 3, 2022 License: Apache-2.0 Imports: 12 Imported by: 1

README

strum – String Unmarshaler

Go Reference Go Report Card Github Actions codecov License

The strum package provides line-oriented text decoding into simple Go variables, slices, and structs.

  • Splits on whitespace, a delimiter, a regular expression, or a custom tokenizer.
  • Supports basic primitive types: strings, booleans, ints, uints, floats.
  • Supports decoding time.Time using the dateparse library.
  • Supports decoding time.Duration.
  • Supports encoding.TextUnmarshaler types.
  • Decodes a line into a single variable, a slice, or a struct.
  • Decodes all lines into a slice of the above.

Synopsis

	d := strum.NewDecoder(os.Stdin)

	// Decode a line to a single int
	var x int
	err = d.Decode(&x)

	// Decode a line to a slice of int
	var xs []int
	err = d.Decode(&xs)

	// Decode a line to a struct
	type person struct {
		Name string
		Age  int
	}
	var p person
	err = d.Decode(&p)

	// Decode all lines to a slice of struct
	var people []person
	err = d.DecodeAll(&people)

Copyright 2021 by David A. Golden. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"). You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Documentation

Overview

Package strum provides a string unmarshaler to tokenize line-oriented text (such as from stdin) and convert tokens into simple Go types.

Tokenization defaults to whitespace-separated fields, but strum supports using delimiters, regular expressions, or a custom tokenizer.

A line with a single token can be unmarshaled into a single variable of any supported type.

A line with multiple tokens can be unmarshaled into a slice or a struct of supported types. It can also be unmarshaled into a single string, in which case tokenization is skipped.

Trying to unmarshal multiple tokens into a single variable or too many tokens for the number of fields in a struct will result in an error. Having too few tokens for the fields in a struct is allowed; remaining fields will be zeroed. When unmarshaling to a slice, decoded values are appended; existing values are untouched.

strum supports the following types:

  • strings
  • booleans (like strconv.ParseBool but case insensitive)
  • integers (signed and unsigned, all widths)
  • floats (32-bit and 64-bit)

Additionally, there is special support for certain types:

  • time.Duration
  • time.Time
  • any type implementing encoding.TextUnmarshaler
  • pointers to supported types (which will auto-instantiate)

For numeric types, all Go literal formats are supported, including base prefixes (`0xff`) and underscores (`1_000_000`) for integers.

For time.Time, strum detects and parses a wide varity of formats using the github.com/araddon/dateparse library. By default, it favors United States interpretation of MM/DD/YYYY and has time zone semantics equivalent to `time.Parse`. strum allows specifying a custom parser instead.

strum provides `DecodeAll` to unmarshal all lines of input at once.

Example (Synopsis)
package main

import (
	"log"
	"os"

	"github.com/xdg-go/strum"
)

func main() {
	var err error
	d := strum.NewDecoder(os.Stdin)

	// Decode a line to a single int
	var x int
	err = d.Decode(&x)
	if err != nil {
		log.Fatal(err)
	}

	// Decode a line to a slice of int
	var xs []int
	err = d.Decode(&xs)
	if err != nil {
		log.Fatal(err)
	}

	// Decode a line to a struct
	type person struct {
		Name string
		Age  int
	}
	var p person
	err = d.Decode(&p)
	if err != nil {
		log.Fatal(err)
	}

	// Decode all lines to a slice of struct
	var people []person
	err = d.DecodeAll(&people)
	if err != nil {
		log.Fatal(err)
	}
}
Output:

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func Unmarshal added in v0.1.0

func Unmarshal(data []byte, v interface{}) error

Unmarshal parses the input data as newline delimited strings and appends the result to the value pointed to by `v`, where `v` must be a pointer to a slice of a type that would valid for Decode. If `v` points to an uninitialized slice, the slice will be created.

Types

type DateParser added in v0.1.0

type DateParser func(s string) (time.Time, error)

A DateParser parses a string into a time.Time struct.

type Decoder

type Decoder struct {
	// contains filtered or unexported fields
}

A Decoder converts an input stream into Go types.

func NewDecoder

func NewDecoder(r io.Reader) *Decoder

NewDecoder returns a Decoder that reads from r. The default Decoder will tokenize with `strings.Fields` function. The default date parser uses github.com/araddon/dateparse.ParseAny.

func (*Decoder) Decode

func (d *Decoder) Decode(v interface{}) error

Decode reads the next line of input and stores it in the value pointed to by `v`. It returns `io.EOF` when no more data is available.

Example (Struct)
package main

import (
	"bytes"
	"fmt"
	"io"
	"log"
	"strings"
	"time"

	"github.com/xdg-go/strum"
)

func main() {
	type person struct {
		Name   string
		Age    int
		Active bool
		Joined time.Time
	}

	lines := []string{
		"John 42 true  2020-03-01T00:00:00Z",
		"Jane 23 false 2022-02-22T00:00:00Z",
	}

	r := bytes.NewBufferString(strings.Join(lines, "\n"))
	d := strum.NewDecoder(r)

	for {
		var p person
		err := d.Decode(&p)
		if err == io.EOF {
			return
		}
		if err != nil {
			log.Fatal(err)
		}
		fmt.Println(p)
	}

}
Output:

{John 42 true 2020-03-01 00:00:00 +0000 UTC}
{Jane 23 false 2022-02-22 00:00:00 +0000 UTC}

func (*Decoder) DecodeAll added in v0.0.2

func (d *Decoder) DecodeAll(v interface{}) error

DecodeAll reads the remaining lines of input into `v`, where `v` must be a pointer to a slice of a type that would valid for Decode. It works as if `Decode` were called for all lines and the resulting values were appended to the slice. If `v` points to an uninitialized slice, the slice will be created. DecodeAll returns `nil` when EOF is reached.

Example (Ints)
package main

import (
	"bytes"
	"fmt"
	"log"
	"strings"

	"github.com/xdg-go/strum"
)

func main() {
	lines := []string{
		"42",
		"23",
	}

	r := bytes.NewBufferString(strings.Join(lines, "\n"))
	d := strum.NewDecoder(r)

	var xs []int
	err := d.DecodeAll(&xs)
	if err != nil {
		log.Fatalf("decoding error: %v", err)
	}

	for _, x := range xs {
		fmt.Printf("%d\n", x)
	}

}
Output:

42
23
Example (Struct)
package main

import (
	"bytes"
	"fmt"
	"log"
	"strings"
	"time"

	"github.com/xdg-go/strum"
)

func main() {
	type person struct {
		Name   string
		Age    int
		Active bool
		Joined time.Time
	}

	lines := []string{
		"John 42 true  2020-03-01T00:00:00Z",
		"Jane 23 false 2022-02-22T00:00:00Z",
	}

	r := bytes.NewBufferString(strings.Join(lines, "\n"))
	d := strum.NewDecoder(r)

	var people []person
	err := d.DecodeAll(&people)
	if err != nil {
		log.Fatalf("decoding error: %v", err)
	}

	for _, p := range people {
		fmt.Printf("%v\n", p)
	}

}
Output:

{John 42 true 2020-03-01 00:00:00 +0000 UTC}
{Jane 23 false 2022-02-22 00:00:00 +0000 UTC}

func (*Decoder) Tokens

func (d *Decoder) Tokens() ([]string, error)

Tokens consumes a line of input and returns all strings generated by the tokenizer. It is used internally by `Decode`, but available for testing or for skipping over a line of input that should not be decoded.

func (*Decoder) WithDateParser added in v0.1.0

func (d *Decoder) WithDateParser(dp DateParser) *Decoder

WithDateParser modifies a Decoder to use a custom date parsing function.

func (*Decoder) WithSplitOn

func (d *Decoder) WithSplitOn(sep string) *Decoder

WithSplitOn modifies a Decoder to split fields on a separator string.

Example
package main

import (
	"bytes"
	"fmt"
	"io"
	"log"

	"github.com/xdg-go/strum"
)

func main() {
	type person struct {
		Last  string
		First string
	}

	text := "Doe,John"
	r := bytes.NewBufferString(text)

	d := strum.NewDecoder(r).WithSplitOn(",")

	var p person
	err := d.Decode(&p)
	if err != nil && err != io.EOF {
		log.Fatal(err)
	}

	fmt.Println(p)

}
Output:

{Doe John}

func (*Decoder) WithTokenRegexp

func (d *Decoder) WithTokenRegexp(re *regexp.Regexp) *Decoder

WithTokenRegexp modifies a Decoder to use a regular expression to extract tokens. The regular expression is called with `FindStringSubmatches` for each line of input, so it must encompass an entire line of input. If the line fails to match or if the regular expression has no subexpressions, an error is returned.

Example
package main

import (
	"bytes"
	"fmt"
	"io"
	"log"
	"regexp"

	"github.com/xdg-go/strum"
)

func main() {
	type jeans struct {
		Color  string
		Waist  int
		Inseam int
	}

	text := "Blue 36x32"
	r := bytes.NewBufferString(text)

	re := regexp.MustCompile(`^(\S+)\s+(\d+)x(\d+)`)
	d := strum.NewDecoder(r).WithTokenRegexp(re)

	var j jeans
	err := d.Decode(&j)
	if err != nil && err != io.EOF {
		log.Fatal(err)
	}

	fmt.Println(j)

}
Output:

{Blue 36 32}

func (*Decoder) WithTokenizer

func (d *Decoder) WithTokenizer(t Tokenizer) *Decoder

WithTokenizer modifies a Decoder to use a custom tokenizing function.

type Tokenizer

type Tokenizer func(s string) ([]string, error)

A Tokenizer is a function that breaks an input string into tokens.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL