collatejson

package
v0.0.0-...-d8c7374 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 24, 2017 License: Apache-2.0, Apache-2.0 Imports: 10 Imported by: 0

README

Collatejson library, written in golang, provides encoding and decoding function
to transform JSON text into binary representation without loosing information.
That is,

* binary representation should preserve the sort order such that, sorting
  binary encoded json documents much match sorting by functions that parse
  and compare JSON documents.
* it must be possible to get back the original document, in semantically
  correct form, from its binary representation.

Notes:

* items in a property object are sorted by its property name before they
  are compared with other property object.

for api documentation and bench marking try,

.. code-block:: bash

    godoc github.com/couchbaselabs/go-collatejson | less
    cd go-collatejson
    go test -test.bench=.

to measure relative difference in sorting 100K elements using encoding/json
library and this library try,

.. code-block:: bash

    go test -test.bench=Sort

examples/* contains reference sort ordering for different json elements.

For known issues refer to `TODO.rst`

Documentation

Overview

Package collatejson supplies Encoding and Decoding function to transform JSON text into binary representation without loosing information. That is,

  • binary representation should preserve the sort order such that, sorting binary encoded json documents much match sorting by functions that parse and compare JSON documents.
  • it must be possible to get back the original document, in semantically correct form, from its binary representation.

Notes:

  • items in a property object are sorted by its property name before they are compared with property's value.

Index

Constants

View Source
const (
	PLUS  = 43
	MINUS = 45
	LT    = 60
	GT    = 62
	DOT   = 46
	ZERO  = 48
)

Constants used in text representation of basic data types.

View Source
const (
	Terminator byte = iota
	TypeMissing
	TypeNull
	TypeFalse
	TypeTrue
	TypeNumber
	TypeString
	TypeLength
	TypeArray
	TypeObj
)

While encoding JSON data-element, both basic and composite, encoded string is prefixed with a type-byte. `Terminator` terminates encoded datum.

View Source
const MinBufferSize = 16

MinBufferSize for target buffer to encode or decode.

View Source
const MissingLiteral = Missing("~[]{}falsenilNA~")

MissingLiteral is special string to denote missing item. IMPORTANT: we are assuming that MissingLiteral will not occur in the keyspace.

Variables

View Source
var ErrLenPrefixUnsupported = errors.New("arrayLenPrefix is unsupported")
View Source
var ErrNotAnArray = errors.New("not an array")
View Source
var ErrorNumberType = errors.New("collatejson.numberType")

ErrorNumberType means configured number type is not supported by codec.

View Source
var ErrorOutputLen = errors.New("collatejson.outputLen")

ErrorOutputLen means output buffer has insufficient length.

View Source
var ErrorSuffixDecoding = errors.New("collatejson.suffixDecoding")

error codes

Functions

func DecodeFloat

func DecodeFloat(code, text []byte) []byte

DecodeFloat complements EncodeFloat, it returns `exponent` and `mantissa` in text format.

func DecodeInt

func DecodeInt(code, text []byte) (int, []byte)

DecodeInt complements EncodeInt, it returns integer in text that can be converted to integer value using strconv.AtoI(return_value)

func DecodeLD

func DecodeLD(code, text []byte) []byte

DecodeLD complements EncodeLD, it returns integer in text that can be converted to integer type using strconv.ParseFloat(return_value, 64).

func DecodeSD

func DecodeSD(code, text []byte) []byte

DecodeSD complements EncodeSD, it returns integer in text that can be converted to integer type using strconv.ParseFloat(return_value, 64).

func EncodeFloat

func EncodeFloat(text, code []byte) []byte

EncodeFloat encodes floating point number such that their natural order is preserved as lexicographic order of their representation. Additionally it must be possible to get back the natural representation from its lexical representation.

A floating point number f takes a mantissa m ∈ [1/10 , 1) and an integer exponent e such that f = (10^e) * ±m.

encoding −0.1 × 10^11    - --7888+
encoding −0.1 × 10^10    - --7898+
encoding -1.4            - -885+
encoding -1.3            - -886+
encoding -1              - -88+
encoding -0.123          - 0876+
encoding -0.0123         - +1876+
encoding -0.001233       - +28766+
encoding -0.00123        - +2876+
encoding 0               0
encoding +0.00123        + -7123-
encoding +0.001233       + -71233-
encoding +0.0123         + -8123-
encoding +0.123          + 0123-
encoding +1              + +11-
encoding +1.3            + +113-
encoding +1.4            + +114-
encoding +0.1 × 10^10    + ++2101-
encoding +0.1 × 10^11    + ++2111-

func EncodeInt

func EncodeInt(text, code []byte) []byte

EncodeInt encodes integer such that their natural order is preserved as a lexicographic order of their representation. Additionally it must be possible to get back the natural representation from its lexical representation.

Input `text` is also in textual representation, that is, strconv.Atoi(text) is the actual integer that is encoded.

Zero is encoded as '0'

func EncodeLD

func EncodeLD(text, code []byte) []byte

EncodeLD encodes large-decimal, values that are greater than or equal to +1.0 and less than or equal to -1.0, such that their natural order is preserved as a lexicographic order of their representation. Additionally it must be possible to get back the natural representation from its lexical representation.

Input `text` is also in textual representation, that is, strconv.ParseFloat(text, 64) is the actual integer that is encoded.

encoding -100.5         --68994>
encoding -10.5          --7>
encoding -3.145         -3854>
encoding -3.14          -385>
encoding -1.01          -198>
encoding -1             -1>
encoding -0.0001233     -09998766>
encoding -0.000123      -0999876>
encoding +0.000123      >0000123-
encoding +0.0001233     >00001233-
encoding +1             >1-
encoding +1.01          >101-
encoding +3.14          >314-
encoding +3.145         >3145-
encoding +10.5          >>2105-
encoding +100.5         >>31005-

func EncodeSD

func EncodeSD(text, code []byte) []byte

EncodeSD encodes small-decimal, values that are greater than -1.0 and less than +1.0,such that their natural order is preserved as lexicographic order of their representation. Additionally it must be possible to get back the natural representation from its lexical representation.

Small decimals is greater than -1.0 and less than 1.0

Input `text` is also in textual representation, that is, strconv.ParseFloat(text, 64) is the actual integer that is encoded.

encoding -0.9995    -0004>
encoding -0.999     -000>
encoding -0.0123    -9876>
encoding -0.00123   -99876>
encoding -0.0001233 -9998766>
encoding -0.000123  -999876>
encoding +0.000123  >000123-
encoding +0.0001233 >0001233-
encoding +0.00123   >00123-
encoding +0.0123    >0123-
encoding +0.999     >999-
encoding +0.9995    >9995-

Caveats:

-0.0, 0.0 and +0.0 must be filtered out as integer ZERO `0`.

Types

type Codec

type Codec struct {
	// contains filtered or unexported fields
}

Codec structure

func NewCodec

func NewCodec(propSize int) *Codec

NewCodec creates a new codec object and returns a reference to it.

func (*Codec) Decode

func (codec *Codec) Decode(code, text []byte) ([]byte, error)

Decode a slice of byte into json string and return them as slice of byte. `text` is the output buffer for decoding and expected to have enough capacity, atleast 3x of input `code` and > MinBufferSize.

func (*Codec) Encode

func (codec *Codec) Encode(text, code []byte) ([]byte, error)

Encode json documents to order preserving binary representation. `code` is the output buffer for encoding and expected to have enough capacity, atleast 3x of input `text` and > MinBufferSize.

func (*Codec) EncodeN1QLValue

func (codec *Codec) EncodeN1QLValue(val n1ql.Value, buf []byte) (bs []byte, err error)

Caller is responsible for providing sufficiently sized buffer Otherwise it may panic

func (*Codec) ExplodeArray

func (codec *Codec) ExplodeArray(code []byte, tmp []byte) ([][]byte, error)

func (*Codec) JoinArray

func (codec *Codec) JoinArray(vals [][]byte, code []byte) ([]byte, error)

func (*Codec) NumberType

func (codec *Codec) NumberType(what string)

NumberType chooses type of encoding / decoding for JSON numbers. Can be "float64", "int64", "decimal". Default is "float64"

func (*Codec) ReverseCollate

func (codec *Codec) ReverseCollate(code []byte, desc []bool) []byte

ReverseCollate reverses the bits in an encoded byte stream based on the fields specified as desc. Calling reverse on an already reversed stream gives back the original stream.

func (*Codec) SortbyArrayLen

func (codec *Codec) SortbyArrayLen(what bool)

SortbyArrayLen sorts array by length before sorting by array elements. Use `false` to sort only by array elements. Default is `true`.

func (*Codec) SortbyPropertyLen

func (codec *Codec) SortbyPropertyLen(what bool)

SortbyPropertyLen sorts property by length before sorting by property items. Use `false` to sort only by proprety items. Default is `true`.

func (*Codec) UseMissing

func (codec *Codec) UseMissing(what bool)

UseMissing will interpret special string MissingLiteral and encode them as TypeMissing. Default is `true`.

type Integer

type Integer struct{}

func (*Integer) ConvertToScientificNotation

func (i *Integer) ConvertToScientificNotation(val int64) (string, error)

Formats an int64 to scientic notation. Example: 75284 converts to 7.5284e+04 1200000 converts to 1.200000e+06 -612988654 converts to -6.12988654e+08 This is used in encode path

func (*Integer) TryConvertFromScientificNotation

func (i *Integer) TryConvertFromScientificNotation(val []byte) (ret []byte)

If float, return e notation If integer, convert from e notation to standard notation This is used in decode path

type Length

type Length int64

Length is an internal type used for prefixing length of arrays and properties.

type Missing

type Missing string

Missing denotes a special type for an item that evaluates to _nothing_.

func (Missing) Equal

func (m Missing) Equal(n string) bool

Equal checks wether n is MissingLiteral

Directories

Path Synopsis
tools

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL