badjson

package module
v0.0.0-...-114974b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 2, 2022 License: GPL-3.0 Imports: 5 Imported by: 3

README

BadJSON text recognizer.

The world has no shortage of pretty good text parsers but here's an idiosyncratic text chopper anyway. It will separate GET /docs/index.html HTTP/1.1 into three byte arrays and also ["GET","/docs/index.html","HTTP/1.1"] into the same three. It can parse {"key1":"val1","key2":"val2"} into 4 strings using a minimum of code.

For example, all the forms below parse into the same 4 byte arrays:

abc:def,ghi:jkl and

"abc":"def","ghi":"jkl" and

abc,def,ghi,jkl" and

abc def ghi "jkl and

"abc":"def","ghi":"jkl"

and the bizarre form:

"abc""def""ghi""jkl"

and also the common form

{"abc":"def","ghi":"jkl"}

we can also declare the bytes directly in hex or base64:

$616263 $646566 =Z2hp =amts also becomes the same three byte arrays (which is the true reason I wrote it).

Delimiters are space, comma, semicolon, and '}' or ']' in some circumstances.

TODO: Replace with something more formal.

Documentation

Overview

Package badjson is a very bad json parser. It will take almost anything. It respects a notation to specify byte arrays by hex or base64. See parse_test.go and the readme. It will parse a lot of JSON and the output from `String()` resembles JSON but it's not really and the objects in key:value notation are just alternating fields in a list and there's no map here. 3/2020 Commented out all the number recognitions since we're not using it.

Index

Examples

Constants

This section is empty.

Variables

View Source
var B64DecodeMap [256]byte

B64DecodeMap from ascii to b64

View Source
var HexMap [256]byte

HexMap has values for hex

Functions

func IsASCII

func IsASCII(bstr []byte) (bool, bool)

IsASCII is true if all chars are >= ' ' and <= 127 the 2nd bool is if the string has delimeters so it would *need quotes*.

func MakeEscaped

func MakeEscaped(str string) string

MakeEscaped will return an 'escaped' version of the string when string contains \ or " the usual escaping for json values and keys

func MakeUnescaped

func MakeUnescaped(str string, theQuote rune) string

MakeUnescaped if we find a \ followed by a \ or a " then skip it

func ToString

func ToString(segment Segment) string

ToString will wrap the list with `[` and `]` and output like child list. todo: move to testing.

Types

type Base

type Base struct {
	// contains filtered or unexported fields
}

func (*Base) Next

func (b *Base) Next() Segment

Next returns the next segment or nil

type Base64Bytes

type Base64Bytes struct {
	Base
}

Base64Bytes for when there's a block of data in base64

func (*Base64Bytes) GetBytes

func (b *Base64Bytes) GetBytes() []byte

GetBytes try to parse b64 to bytes

func (*Base64Bytes) GetQuoted

func (b *Base64Bytes) GetQuoted() string

func (*Base64Bytes) Raw

func (b *Base64Bytes) Raw() string

Raw decodes and then reencodes because the input can be weird

type HexBytes

type HexBytes struct {
	Base
}

HexBytes is for when there's a block of data in hex.

func (*HexBytes) GetBytes

func (b *HexBytes) GetBytes() []byte

GetBytes try to parse

func (*HexBytes) GetQuoted

func (b *HexBytes) GetQuoted() string

func (*HexBytes) Raw

func (b *HexBytes) Raw() string

Raw is unquoted

type Parent

type Parent struct {
	Base
	// contains filtered or unexported fields
}

Parent has a sub-list

func AsParent

func AsParent(s Segment) *Parent

AsParent returns pointer to Parent if s is a Parent

func (*Parent) GetQuoted

func (b *Parent) GetQuoted() string

func (*Parent) Raw

func (b *Parent) Raw() string

Raw is

type RuneArray

type RuneArray struct {
	Base
	// contains filtered or unexported fields
}

RuneArray is a span of runes with quoting hints

func (*RuneArray) GetQuoted

func (b *RuneArray) GetQuoted() string

Return the string in json format so we always quote with " and never '

func (*RuneArray) Raw

func (b *RuneArray) Raw() string

Raw returns the 'original' string with no escaping

type Segment

type Segment interface {
	Next() Segment

	GetQuoted() string // as json so 123 or "abc" or "=ABC" or "$414243"
	Raw() string       // unquoted
	// contains filtered or unexported methods
}

Segment is a what a chunk of text will become and we'll be returning a list of them. Number or string or []byte are the only types.

func Chop

func Chop(inputLineOfText string) (Segment, error)

Chop up a line of text into segments. Calling it a parser would be overstating. Returns a head of a list, the number of bytes consumed, and maybe an error.

Example
package main

import (
	"fmt"
	"reflect"

	"github.com/awootton/knotfreeiot/badjson"
)

func main() {

	someText := `abc:def,ghi:jkl` // an array of 4 strings

	// parse the text
	segment, err := badjson.Chop(someText)
	if err != nil {
		fmt.Println(err)
	}
	// traverse the result
	for s := segment; s != nil; s = s.Next() {
		fmt.Println(reflect.TypeOf(s))
	}
	// output it
	output := badjson.ToString(segment)
	fmt.Println(output)

	someText = `"abc""def""ghi""jkl"` // an quoted array of 4 strings
	segment, err = badjson.Chop(someText)
	if err != nil {
		fmt.Println(err)
	}
	output = badjson.ToString(segment)
	fmt.Println(output)

	// Expect: *badjson.RuneArray
	// *badjson.RuneArray
	// *badjson.RuneArray
	// *badjson.RuneArray
	// ["abc","def","ghi","jkl"]
	// ["abc","def","ghi","jkl"]
}
Output:

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL