djson

package module
v0.0.0-...-4dd8773 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 3, 2016 License: MIT Imports: 4 Imported by: 0

README

DJSON

GoDoc Build Status LICENSE

DJSON is a JSON decoder for Go that is 2~ to 3~ times faster than the standard encoding/json and the existing solutions, when dealing with arbitrary JSON payload. See benchmarks below.
It is a good approach for people who are using json.Unmarshal together with interface{}, don't know what the schema is, and still want good performance with minimal changes.

Motivation

While searching for a JSON parser solution for my projects, that is faster than the standard library, with zero reflection tests, allocates less memory and is still safe(I didn't want the "unsafe" package in my production code, in order to reduce memory consumption).
I found that almost all implemtations are just wrappers around the standard library and aren't fast enough for my needs.
I encountered two projects: ujson that is the UltraJSON implementation and jsonparser, that is a pretty awesome project.
ujson seems to be faster than encoding/json but still doesn't meet my requirements.
jsonparser seems to be really fast, and I even use it for some of my new projects.
However, its API is different, and I would need to change too much of my code in order to work with it.
Also, for my processing work that involves ETL, changing and setting new fields on the JSON object, I need to transform the jsonparser result to map[string]interface{} and it seems that it loses its power.

Advantages and Stability

As you can see in the benchmark below, DJSON is faster and allocates less memory than the other alternatives.
The current version is 1.0.0-alpha.1, and I'm waiting to hear from you if there are any issues or bug reports, to make it stable.
(comment: there is a test file named decode_test that contains a test case that compares the results to encoding/json - feel free to add more values if you find they are important)
I'm also plaining to add the DecodeStream(io.ReaderCloser) method(or NewDecoder(io.ReaderCloser)), to support stream decoding without breaking performance.

Benchmark

There are 3 benchmark types: small, medium and large payloads.
All the 3 are taken from the jsonparser project, and they try to simulate a real-life usage. Each result from the different benchmark types is shown in a metric table below. The lower the metrics are, the better the result is. Time/op is in nanoseconds, B/op is how many bytes were allocated per op and allocs/op is the total number of memory allocations.
Benchmark results that are better than encoding/json are marked in bold text.
The Benchmark tests run on AWS EC2 instance(c4.xlarge). see: screenshots

Compared libraries:

Small payload

Each library in the test gets a small payload to process that weighs 134 bytes.
You can see the payload here, and the test screenshot here.

Library Time/op B/op allocs/op
encoding/json 8646 1993 60
ugorji/go/codec 9272 4513 41
antonholmquist/jason 7336 3201 49
bitly/go-simplejson 5253 2241 36
Jeffail/gabs 4788 1409 33
mreiferson/go-ujson 3897 1393 35
a8m/djson 2534 1137 25
a8m/djson.AllocString 2195 1169 13

Medium payload

Each library in the test gets a medium payload to process that weighs 1.7KB.
You can see the payload here, and the test screenshot here.

Library Time/op B/op allocs/op
encoding/json 42029 10652 218
ugorji/go/codec 65007 15267 313
antonholmquist/jason 45676 17476 224
bitly/go-simplejson 45164 17156 219
Jeffail/gabs 41045 10515 211
mreiferson/go-ujson 33213 11506 267
a8m/djson 22871 10100 195
a8m/djson.AllocString 19296 10619 87

Large payload

Each library in the test gets a large payload to process that weighs 28KB.
You can see the payload here, and the test screenshot here.

Library Time/op B/op allocs/op
encoding/json 717882 212827 3247
ugorji/go/codec 1052347 239130 4426
antonholmquist/jason 751910 277931 3257
bitly/go-simplejson 753663 277628 3252
Jeffail/gabs 714304 212740 3241
mreiferson/go-ujson 599868 235789 4057
a8m/djson 437031 210997 2932
a8m/djson.AllocString 372382 214053 1413

LICENSE

MIT

Documentation

Overview

Most of the code here copied from the Go standard library, encoding/json/decode.go

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	ErrUnexpectedEOF    = &SyntaxError{"unexpected end of JSON input", -1}
	ErrInvalidHexEscape = &SyntaxError{"invalid hexadecimal escape sequence", -1}
	ErrStringEscape     = &SyntaxError{"encountered an invalid escape sequence in a string", -1}
)

Predefined errors

Functions

func Decode

func Decode(data []byte) (interface{}, error)

Decode parses the JSON-encoded data and returns an interface value. The interface value could be one of these:

bool, for JSON booleans
float64, for JSON numbers
string, for JSON strings
[]interface{}, for JSON arrays
map[string]interface{}, for JSON objects
nil for JSON null

Note that the Decode is compatible with the the following insructions:

var v interface{}
err := json.Unmarshal(data, &v)
Example
package main

import (
	"fmt"
	"log"

	"github.com/a8m/djson"
)

func main() {
	var data = []byte(`[
		{"Name": "Platypus", "Order": "Monotremata"},
		{"Name": "Quoll",    "Order": "Dasyuromorphia"}
	]`)

	val, err := djson.Decode(data)
	if err != nil {
		log.Fatal("error:", err)
	}

	fmt.Printf("%+v", val)

	// - Output:
	// [map[Name:Platypus Order:Monotremata] map[Name:Quoll Order:Dasyuromorphia]]
}
Output:

func DecodeArray

func DecodeArray(data []byte) ([]interface{}, error)

DecodeArray is the same as Decode but it returns []interface{}. You should use it to parse JSON arrays.

Example
package main

import (
	"fmt"
	"log"

	"github.com/a8m/djson"
)

func main() {
	var data = []byte(`[
		"John",
		"Dan",
		"Kory",
		"Ariel"
	]`)

	users, err := djson.DecodeArray(data)
	if err != nil {
		log.Fatal("error:", err)
	}
	for i, user := range users {
		fmt.Printf("[%d]: %v\n", i, user)
	}
}
Output:

func DecodeObject

func DecodeObject(data []byte) (map[string]interface{}, error)

DecodeObject is the same as Decode but it returns map[string]interface{}. You should use it to parse JSON objects.

Example

Example that demonstrate the basic transformation I do on each incoming event. `lowerKeys` and `fixEncoding` are two generic methods, and they don't care about the schema. The three others(`maxMindGeo`, `dateFormat`and `refererURL`) process and extend the events dynamically based on the "APP_ID" field.

package main

import (
	"fmt"
	"log"

	"github.com/a8m/djson"
)

func main() {
	var data = []byte(`{
		"ID": 76523,
		"IP": "69.89.31.226"
		"APP_ID": "BD311",
		"Name": "Ariel",
		"Username": "a8m",
		"Score": 99,
		"Date": 1475332371532,
		"Image": {
			"Src": "images/67.png",
			"Height": 450,
			"Width":  370,
			"Alignment": "center"
		},
		"RefererURL": "http://..."
	}`)
	event, err := djson.DecodeObject(data)
	if err != nil {
		log.Fatal("error:", err)
	}

	fmt.Printf("Value: %v", event)

	// Process the event
	//
	// lowerKeys(event)
	// fixEncoding(event)
	// dateFormat(event)
	// maxMindGeo(event)
	// refererURL(event)
	//
	// pipeline.Pipe(event)
}
Output:

Types

type Decoder

type Decoder struct {
	// contains filtered or unexported fields
}

Decoder is the object that holds the state of the decoding

func NewDecoder

func NewDecoder(data []byte) *Decoder

NewDecoder creates new Decoder from the JSON-encoded data

func (*Decoder) AllocString

func (d *Decoder) AllocString()

AllocString pre-allocates a string version of the data before starting to decode the data. It is used to make the decode operation faster(see below) by doing one allocation operation for string conversion(from bytes), and then uses "slicing" to create non-escaped strings in the "Decoder.string" method. However, string is a read-only slice, and since the slice references the original array, as long as the slice is kept around, the garbage collector can't release the array. For this reason, you want to use this method only when the Decoder's result is a "read-only" or you are adding more elements to it. see example below.

Here are the improvements:

small payload  - 0.13~ time faster, does 0.45~ less memory allocations but
		 the total number of bytes that are allocated is 0.03~ bigger

medium payload - 0.16~ time faster, does 0.5~ less memory allocations but
		 the total number of bytes that are allocated is 0.05~ bigger

large payload  - 0.13~ time faster, does 0.50~ less memory allocations but
		 the total number of bytes that are allocated is 0.02~ bigger

Here is an example to illustrate when you don't want to use this method

str := fmt.Sprintf(`{"foo": "bar", "baz": "%s"}`, strings.Repeat("#", 1024 * 1024))
dec := djson.NewDecoder([]byte(str))
dec.AllocString()
ev, err := dec.DecodeObject()

// inpect memory stats here; MemStats.Alloc ~= 1M

delete(ev, "baz") // or ev["baz"] = "qux"

// inpect memory stats again; MemStats.Alloc ~= 1M
// it means that the chunk that was located in the "baz" value is not freed
Example
package main

import (
	"fmt"
	"log"

	"github.com/a8m/djson"
)

func main() {
	var data = []byte(`{"event_type":"click","count":"93","userid":"4234A"}`)
	dec := djson.NewDecoder(data)
	dec.AllocString()

	val, err := dec.DecodeObject()
	if err != nil {
		log.Fatal("error:", err)
	}

	fmt.Printf("Value: %+v", val)

	// - Output:
	// map[count:93 userid:4234A event_type:click]
}
Output:

func (*Decoder) Decode

func (d *Decoder) Decode() (interface{}, error)

Decode parses the JSON-encoded data and returns an interface value. The interface value could be one of these:

bool, for JSON booleans
float64, for JSON numbers
string, for JSON strings
[]interface{}, for JSON arrays
map[string]interface{}, for JSON objects
nil for JSON null

Note that the Decode is compatible with the the following insructions:

var v interface{}
err := json.Unmarshal(data, &v)

func (*Decoder) DecodeArray

func (d *Decoder) DecodeArray() ([]interface{}, error)

DecodeArray is the same as Decode but it returns []interface{}. You should use it to parse JSON arrays.

func (*Decoder) DecodeObject

func (d *Decoder) DecodeObject() (map[string]interface{}, error)

DecodeObject is the same as Decode but it returns map[string]interface{}. You should use it to parse JSON objects.

type SyntaxError

type SyntaxError struct {
	Offset int // error occurred after reading Offset bytes
	// contains filtered or unexported fields
}

A SyntaxError is a description of a JSON syntax error.

func (*SyntaxError) Error

func (e *SyntaxError) Error() string

type ValueType

type ValueType int

ValueType identifies the type of a parsed value.

const (
	Null ValueType = iota
	Bool
	String
	Number
	Object
	Array
	Unknown
)

func Type

func Type(v interface{}) ValueType

Type returns the JSON-type of the given value

func (ValueType) String

func (v ValueType) String() string

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL