README

jstream

GoDoc

jstream is a streaming JSON parser and value extraction library for Go.

Unlike most JSON parsers, jstream is document position- and depth-aware -- this enables the extraction of values at a specified depth, eliminating the overhead of allocating encompassing arrays or objects; e.g:

Using the below example document: jstream

we can choose to extract and act only the objects within the top-level array:

f, _ := os.Open("input.json")
decoder := jstream.NewDecoder(f, 1) // extract JSON values at a depth level of 1
for mv := range decoder.Stream() {
  fmt.Printf("%v\n ", mv.Value)
}

output:

map[desc:RGB colors:[red green blue]]
map[desc:CMYK colors:[cyan magenta yellow black]]

likewise, increasing depth level to 3 yields:

red
green
blue
cyan
magenta
yellow
black

optionally, kev:value pairs can be emitted as an individual struct:

decoder := jstream.NewDecoder(f, 2).EmitKV() // enable KV streaming at a depth level of 2
jstream.KV{desc RGB}
jstream.KV{colors [red green blue]}
jstream.KV{desc CMYK}
jstream.KV{colors [cyan magenta yellow black]}

Installing

go get github.com/bcicen/jstream

Commandline

jstream comes with a cli tool for quick viewing of parsed values from JSON input:

jstream -d 1 < input.json
{"colors":["red","green","blue"],"desc":"RGB"}
{"colors":["cyan","magenta","yellow","black"],"desc":"CMYK"}

detailed output with -v option:

cat input.json | jstream -v -d -1

depth	start	end	type   | value
2	018	023	string | "RGB"
3	041	046	string | "red"
3	048	055	string | "green"
3	057	063	string | "blue"
2	039	065	array  | ["red","green","blue"]
1	004	069	object | {"colors":["red","green","blue"],"desc":"RGB"}
2	087	093	string | "CMYK"
3	111	117	string | "cyan"
3	119	128	string | "magenta"
3	130	138	string | "yellow"
3	140	147	string | "black"
2	109	149	array  | ["cyan","magenta","yellow","black"]
1	073	153	object | {"colors":["cyan","magenta","yellow","black"],"desc":"CMYK"}
0	000	155	array  | [{"colors":["red","green","blue"],"desc":"RGB"},{"colors":["cyan","magenta","yellow","black"],"desc":"CMYK"}]
Options
Opt Description
-d <n> emit values at depth n. if n < 0, all values will be emitted
-kv output inner key value pairs as newly formed objects
-v output depth and offset details for each value
-h display help dialog

Benchmarks

Obligatory benchmarks performed on files with arrays of objects, where the decoded objects are to be extracted.

Two file sizes are used -- regular (1.6mb, 1000 objects) and large (128mb, 100000 objects)

input size lib MB/s Allocated
regular standard 97 3.6MB
regular jstream 175 2.1MB
large standard 92 305MB
large jstream 404 69MB

In a real world scenario, including initialization and reader overhead from varying blob sizes, performance can be expected as below: jstream

Expand ▾ Collapse ▴

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrSyntax        = DecoderError{/* contains filtered or unexported fields */}
	ErrUnexpectedEOF = DecoderError{/* contains filtered or unexported fields */}
)

    Predefined errors

    Functions

    This section is empty.

    Types

    type Decoder

    type Decoder struct {
    	// contains filtered or unexported fields
    }

      Decoder wraps an io.Reader to provide incremental decoding of JSON values

      func NewDecoder

      func NewDecoder(r io.Reader, emitDepth int) *Decoder

        NewDecoder creates new Decoder to read JSON values at the provided emitDepth from the provider io.Reader. If emitDepth is < 0, values at every depth will be emitted.

        func (*Decoder) EmitKV

        func (d *Decoder) EmitKV() *Decoder

          EmitKV enables emitting a jstream.KV struct when the items(s) parsed at configured emit depth are within a JSON object. By default, only the object values are emitted.

          func (*Decoder) Err

          func (d *Decoder) Err() error

            Err returns the most recent decoder error if any, or nil

            func (*Decoder) ObjectAsKVS

            func (d *Decoder) ObjectAsKVS() *Decoder

              ObjectAsKVS - by default JSON returns map[string]interface{} this is usually fine in most cases, but when you need to preserve the input order its not a right data structure. To preserve input order please use this option.

              func (*Decoder) Pos

              func (d *Decoder) Pos() int

                Pos returns the number of bytes consumed from the underlying reader

                func (*Decoder) Recursive

                func (d *Decoder) Recursive() *Decoder

                  Recursive enables emitting all values at a depth higher than the configured emit depth; e.g. if an array is found at emit depth, all values within the array are emitted to the stream, then the array containing those values is emitted.

                  func (*Decoder) Stream

                  func (d *Decoder) Stream() chan *MetaValue

                    Stream begins decoding from the underlying reader and returns a streaming MetaValue channel for JSON values at the configured emitDepth.

                    type DecoderError

                    type DecoderError struct {
                    	// contains filtered or unexported fields
                    }

                    func (DecoderError) Error

                    func (e DecoderError) Error() string

                    func (DecoderError) ReaderErr

                    func (e DecoderError) ReaderErr() error

                    type KV

                    type KV struct {
                    	Key   string      `json:"key"`
                    	Value interface{} `json:"value"`
                    }

                      KV contains a key and value pair parsed from a decoded object

                      type KVS

                      type KVS []KV

                        KVS - represents key values in an JSON object

                        func (KVS) MarshalJSON

                        func (kvs KVS) MarshalJSON() ([]byte, error)

                          MarshalJSON - implements converting a KVS datastructure into a JSON object with multiple keys and values.

                          type MetaValue

                          type MetaValue struct {
                          	Offset    int
                          	Length    int
                          	Depth     int
                          	Value     interface{}
                          	ValueType ValueType
                          }

                            MetaValue wraps a decoded interface value with the document position and depth at which the value was parsed

                            type ValueType

                            type ValueType int

                              ValueType - defines the type of each JSON value

                              const (
                              	Unknown ValueType = iota
                              	Null
                              	String
                              	Number
                              	Boolean
                              	Array
                              	Object
                              )

                                Different types of JSON value

                                Directories

                                Path Synopsis
                                cmd