jsonex

package module
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 2, 2025 License: Apache-2.0 Imports: 8 Imported by: 1

README

jsonex - Go JSON Extractor for Unclean Text

Unit test Lint Gosec trivy Go Reference

A JSON parser library for Go that can extract valid JSON from noisy data streams while maintaining RFC 8259 compliance.

This library is primarily designed to parse incomplete or mixed JSON responses from generative AI systems, where JSON may be embedded within explanatory text or contain surrounding noise.

Features

  • Parsing from noisy data: Extracts the longest valid JSON object or array from data containing garbage/noise
  • RFC 8259 Compliant: Full compliance with JSON specification including proper escape sequence handling
  • Dual-Mode Operation:
    • Fast path for clean JSON using standard library
    • Fallback path for extracting JSON from noisy data
  • Streaming Support: Decoder for processing multiple JSON objects from streams
  • Unicode Support: Full UTF-8 support including surrogate pairs
  • Configurable: Customizable depth limits and buffer sizes

Installation

go get github.com/m-mizutani/jsonex

Quick Start

Basic Usage
package main

import (
    "fmt"
    "github.com/m-mizutani/jsonex"
)

func main() {
    // Extract JSON from noisy data
    data := []byte(`garbage {"name": "John", "age": 30} more noise`)
    
    var result map[string]interface{}
    err := jsonex.Unmarshal(data, &result)
    if err != nil {
        panic(err)
    }
    
    fmt.Println(result["name"]) // Output: John
    fmt.Println(result["age"])  // Output: 30
}
Streaming Decoder
package main

import (
    "fmt"
    "strings"
    "github.com/m-mizutani/jsonex"
)

func main() {
    input := `noise {"first": 1} garbage {"second": 2} end`
    decoder := jsonex.New(strings.NewReader(input))
    
    var obj1, obj2 map[string]interface{}
    
    // Decode first JSON object
    if err := decoder.Decode(&obj1); err != nil {
        panic(err)
    }
    
    // Decode second JSON object  
    if err := decoder.Decode(&obj2); err != nil {
        panic(err)
    }
    
    fmt.Println(obj1["first"])  // Output: 1
    fmt.Println(obj2["second"]) // Output: 2
}
With Options
package main

import (
    "github.com/m-mizutani/jsonex"
)

func main() {
    data := []byte(`{"deeply": {"nested": {"json": "value"}}}`)
    
    var result map[string]interface{}
    err := jsonex.Unmarshal(data, &result,
        jsonex.WithMaxDepth(10),      // Set maximum nesting depth
        jsonex.WithBufferSize(8192),  // Set buffer size
    )
    if err != nil {
        panic(err)
    }
}

API Reference

Functions
Unmarshal(data []byte, v interface{}, opts ...Option) error

Parses JSON-encoded data and stores the result in the value pointed to by v. Unlike standard json.Unmarshal, this function extracts the longest valid JSON object or array from the input data, ignoring any preceding or trailing invalid content.

New(r io.Reader, opts ...Option) *Decoder

Creates a new Decoder that reads from r.

Types
Decoder
type Decoder struct {
    // contains filtered or unexported fields
}

func (d *Decoder) Decode(v interface{}) error
Options
WithMaxDepth(depth int) Option

Sets the maximum nesting depth for JSON parsing (default: 1000).

WithBufferSize(size int) Option

Sets the buffer size for internal operations (default: 4096).

RFC 8259 Compliance

This library is fully compliant with RFC 8259 (The JavaScript Object Notation Data Interchange Format):

  • ✅ Proper JSON grammar support (objects, arrays, strings, numbers, booleans, null)
  • ✅ Complete escape sequence handling (\", \\, \/, \b, \f, \n, \r, \t)
  • ✅ Unicode escape sequences (\uXXXX) including surrogate pairs
  • ✅ UTF-8 character encoding support
  • ✅ Syntax validation and error reporting
  • ✅ Whitespace handling
  • ✅ Number format validation

Performance

Benchmark results:

BenchmarkStdLib_Unmarshal_Small-10      1310816    916.6 ns/op
BenchmarkJsonex_Unmarshal_Small-10      1264669    942.6 ns/op
BenchmarkJsonex_Unmarshal_Robust-10      637006   1874 ns/op

Note: The robust parsing mode has additional overhead compared to the standard library due to the extra processing required for handling noisy data.

Error Handling

The library provides detailed error information including:

  • Error type classification (syntax, unicode, escape, EOF, invalid JSON)
  • Position information (line, column, offset)
  • Contextual error messages
if err := jsonex.Unmarshal(data, &result); err != nil {
    if jsonErr, ok := err.(*jsonex.Error); ok {
        fmt.Printf("Error at line %d, column %d: %s\n", 
            jsonErr.Position.Line, 
            jsonErr.Position.Column, 
            jsonErr.Message)
    }
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Fuzzing

This library includes fuzzing tests to validate behavior against various inputs.

Running Fuzzing Tests
# Install Task runner first: https://taskfile.dev/installation/

# Run all fuzzing tests
task fuzz

# Run individual fuzzing tests
task fuzz:unmarshal      # Test Unmarshal function
task fuzz:decoder        # Test Decoder function  
task fuzz:unicode        # Test Unicode handling
task fuzz:deep-nesting   # Test deep nesting structures

# Run extended fuzzing (5 minutes each)
task fuzz:long

# Clean fuzzing corpus and cache
task fuzz:clean
Using Go Commands Directly
# Run individual fuzz tests
go test -fuzz=FuzzUnmarshal -fuzztime=30s
go test -fuzz=FuzzDecoder -fuzztime=30s
go test -fuzz=FuzzUnicodeHandling -fuzztime=30s
go test -fuzz=FuzzDeepNesting -fuzztime=30s

# Run for longer duration
go test -fuzz=FuzzUnmarshal -fuzztime=5m
Fuzzing Coverage

The fuzzing tests cover:

  • Unmarshal Function: Various JSON inputs including malformed data, edge cases, and robust parsing scenarios
  • Decoder Function: Streaming JSON processing with multiple objects and incomplete data
  • Unicode Handling: UTF-8 validation, escape sequences, surrogate pairs, and invalid sequences
  • Deep Nesting: Extremely nested structures to test stack limits and memory usage
Fuzzing Corpus

Fuzzing corpus files are stored in testdata/fuzz/ and include automatically generated test cases that increase code coverage. The corpus is managed automatically but can be cleaned with task fuzz:clean.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Documentation

Overview

Package jsonex provides a robust JSON parser that can extract valid JSON objects and arrays from input streams that may contain invalid or extraneous data.

The parser is designed to be RFC 8259 compliant and focuses on extracting structured data (objects and arrays) while ignoring primitive values that might interfere with robust parsing.

Key features: - Extracts JSON objects and arrays from any input stream - Skips invalid characters and finds JSON start positions - Decoder: extracts the first valid JSON (streaming) - Unmarshal: extracts the longest valid JSON (batch processing) - Configurable options for depth limits and buffer sizes - Comprehensive error reporting with position information

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Unmarshal

func Unmarshal(data []byte, v interface{}, opts ...Option) error

Unmarshal parses the JSON-encoded data and stores the result in the value pointed to by v Unlike the standard json.Unmarshal, this function extracts the longest valid JSON object or array from the input data, ignoring any preceding or trailing invalid content

Types

type Decoder

type Decoder struct {
	// contains filtered or unexported fields
}

Decoder reads and decodes JSON values from an input stream

func New

func New(r io.Reader, opts ...Option) *Decoder

New creates a new Decoder that reads from r

func (*Decoder) Buffered

func (d *Decoder) Buffered() io.Reader

Buffered returns a reader of the data remaining in the Decoder's buffer This can be useful for reading any remaining data after JSON parsing

func (*Decoder) Decode

func (d *Decoder) Decode(v interface{}) error

Decode reads the next JSON-encoded value from its input and stores it in the value pointed to by v The behavior is similar to json.Decoder.Decode but only accepts objects and arrays

func (*Decoder) DisallowUnknownFields

func (d *Decoder) DisallowUnknownFields()

DisallowUnknownFields causes the Decoder to return an error when the destination is a struct and the input contains object keys which do not match any non-ignored, exported fields in the destination

func (*Decoder) UseNumber

func (d *Decoder) UseNumber()

UseNumber causes the Decoder to unmarshal a number into an interface{} as a Number instead of as a float64

type Error

type Error struct {
	Type     ErrorType
	Message  string
	Position Position
	Context  string
}

Error represents an error that occurred during JSON parsing

func (*Error) Error

func (e *Error) Error() string

Error implements the error interface

type ErrorType

type ErrorType int

ErrorType represents the type of error that occurred during parsing

const (
	ErrSyntax ErrorType = iota
	ErrUnicode
	ErrEscape
	ErrEOF
	ErrInvalidJSON
)

func (ErrorType) String

func (t ErrorType) String() string

String returns the string representation of ErrorType

type Option

type Option func(*options)

Option is a function that modifies options

func WithBufferSize

func WithBufferSize(size int) Option

WithBufferSize sets the read buffer size for performance tuning Larger buffers may improve performance for large JSON files

func WithMaxDepth

func WithMaxDepth(depth int) Option

WithMaxDepth sets the maximum nesting depth This helps prevent stack overflow attacks with deeply nested JSON

type Position

type Position struct {
	Offset int // byte offset
	Line   int // line number (1-based)
	Column int // column number (1-based)
}

Position represents a position in the input stream

func (Position) String

func (p Position) String() string

String returns the string representation of Position

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL