jsonc

package module
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 5, 2023 License: Apache-2.0 Imports: 4 Imported by: 0

README

jsonc - JSON with comments for Go

Go Doc License CI codecov Go Report Card

jsonc is a light and dependency-free package for working with JSON with comments data built on top of encoding/json. It allows to remove comments converting to valid JSON-encoded data and to unmarshal JSON with comments into Go values.

The dependencies listed in go.mod are only used for testing and benchmarking or to support alternative libraries.

Features

  • Full support for comment lines and block comments
  • Preserve the content of strings that contain comment characters
  • Sanitize JSON with comments data by removing comments
  • Unmarshal JSON with comments into Go values

Installation

Install the jsonc package:

go get github.com/marcozac/go-jsonc

Usage

Sanitize - Remove comments from JSON data

Sanitize removes all comments from JSON data, returning valid JSON-encoded byte slice that is compatible with standard library's json.Unmarshal.

It works with comment lines and block comments anywhere in the JSONC data, preserving the content of strings that contain comment characters.

Example
package main

import (
    "encoding/json"

    "github.com/marcozac/go-jsonc"
)

func main() {
    invalidData := []byte(`{
        // a comment
        "foo": "bar" /* a comment in a weird place */,
        /*
            a block comment
        */
        "hello": "world" // another comment
    }`)

    // Remove comments from JSONC
    data, err := jsonc.Sanitize(invalidData)
    if err != nil {
        ...
    }

    var v struct{
      Foo   string
      Hello string
    }

    // Unmarshal using any other library
    if err := json.Unmarshal(data, &v); err != nil {
        ...
    }
}
Unmarshal - Parse JSON with comments into a Go value

Unmarshal replicates the behavior of the standard library's json.Unmarshal function, with the addition of support for comments.

It is optimized to avoid calling Sanitize unless it detects comments in the data. This avoids the overhead of removing comments when they are not present, improving performance on small data sets.

It first checks if the data contains comment characters as // or /* using HasCommentRunes. If no comment characters are found, it directly unmarshals the data.

Only if comments are detected it calls Sanitize before unmarshaling to remove them. So, Unmarshal tries to skip unnecessary work when possible, but currently it is not possible to detect false positives as // or /* inside strings.

Since the comment detection is based on a simple rune check, it is not recommended to use Unmarshal on large data sets unless you are not sure whether they contain comments. Indeed, HasCommentRunes needs to checks every single byte before to return false and may drastically slow down the process.

In this case, it is more efficient to call Sanitize before to unmarshal the data.

Example
package main

import "github.com/marcozac/go-jsonc"

func main() {
    invalidData := []byte(`{
        // a comment
        "foo": "bar"
    }`)

    var v struct{ Foo string }

    err := jsonc.Unmarshal(invalidData, &v)
    if err != nil {
    ...
    }
}

Alternative libraries

By default, jsonc uses the standard library's encoding/json to unmarshal JSON data and has no external dependencies.

It is possible to use build tags to use alternative libraries instead of the standard library's encoding/json:

Tag Library
none or both standard library
jsoniter github.com/json-iterator/go
go_json github.com/goccy/go-json

Benchmarks

This library aims to have performance comparable to the standard library's encoding/json. Unfortunately, comments removal is not free and it is not possible to avoid the overhead of removing comments when they are present.

Currently jsonc performs worse than the standard library's encoding/json on small data sets about 27% on data with comments in strings and 16% on data without comments. On medium data sets, the performance gap is increased to about 30% on data with comments in strings and reduced to 12% on data without comments.

However, using one of the alternative libraries, it is possible to achieve better performance than the standard library's encoding/json even considering the overhead of removing comments.

See benchmarks for the full results.

The benchmarks are run on a MacBook Pro (16-inch, 2021), Apple M1 Max, 32 GB RAM.

Contributing

❤ Contributions are needed welcome!

Please open an issue or submit a pull request if you would like to contribute.

To submit a pull request:

  • Fork this repository
  • Create a new branch
  • Make changes and commit
  • Push to your fork and submit a pull request

License

This project is licensed under the Apache 2.0 license. See LICENSE file for details.

Documentation

Index

Examples

Constants

This section is empty.

Variables

View Source
var ErrInvalidUTF8 = errors.New("jsonc: invalid UTF-8")

ErrInvalidUTF8 is returned by Sanitize if the data is not valid UTF-8.

Functions

func HasCommentRunes added in v0.1.1

func HasCommentRunes(data []byte) bool

HasCommentRunes returns true if the data contains any comment rune. It checks whether the data contains any '/' character, and if so, it looks whether the previous one is a '/' or the next one is a '/' or a '*'. If not, it returns false.

Caveat: if the data contains a string that looks like a comment as '{"url": "http://example.com"}', HasCommentRunes returns true.

For example, it returns true for the following data:

{
	// comment
	"key": "value"
}

or

{
	/* comment
	"key": "value"
	*/
	"foo": "bar"
}

But also for:

{ "key": "value // comment" }

func Sanitize

func Sanitize(data []byte) ([]byte, error)

Sanitize removes all comments from JSONC data. It returns ErrInvalidUTF8 if the data is not valid UTF-8.

NOTE: it does not checks whether the data is valid JSON or not.

func Unmarshal

func Unmarshal(data []byte, v any) error

Unmarshal parses the JSONC-encoded data and stores the result in the value pointed by v removing all comments from the data (if any).

It uses HasCommentRunes to check whether the data contains any comment. Note that this operation is as expensive as the larger the data. On small data sets it just adds a small overhead to the unmarshaling process, but on large data sets it may have a significant impact on performance. In such cases, it may be more efficient to call Sanitize and then the standard (or any other) library directly.

If the data contains comment runes, it calls Sanitize to remove them and returns ErrInvalidUTF8 if the data is not valid UTF-8. Note that if no comments are found, it is assumed that the given data is valid JSON-encoded and the UTF-8 validity is not checked.

Any error is reported from json.Unmarshal as is.

It uses the standard library for unmarshaling by default, but can be configured to use the jsoniter or go-json library instead by using build tags.

| tag           | library                       |
|---------------|-------------------------------|
| none or both	| standard library              |
| jsoniter	| "github.com/json-iterator/go" |
| go_json	| "github.com/goccy/go-json"    |

Example:

data := []byte(`{/* comment */"name": "John", "age": 30}`)
type T struct {
	Name string
	Age  int
}
var t T
err := jsonc.Unmarshal(data, &t)
...
Example
var v interface{}

data := []byte(`{/* comment */"foo": "bar"}`)

err := jsonc.Unmarshal(data, &v)
if err != nil {
	panic(err)
}

fmt.Println(v)
Output:

map[foo:bar]
Example (SanitizeError)
var v interface{}

invalid := []byte(`{/* comment */"foo": "invalid utf8"}`)
invalid = append(invalid, []byte("\xa5")...)

err := jsonc.Unmarshal(invalid, &v)
fmt.Println(err)
Output:

jsonc: invalid UTF-8

Types

This section is empty.

Directories

Path Synopsis
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL