Documentation
¶
Overview ¶
Package sap provides a Schema-Aligned Parser for extracting typed data from messy LLM-generated JSON.
LLMs frequently produce JSON that is technically invalid: wrapped in markdown code blocks, decorated with explanatory text, or containing syntax errors like unquoted keys, single quotes, trailing commas, and inline comments. Package sap handles all of this automatically. It extracts JSON candidates from free-form text, fixes common formatting problems, coerces mismatched types to fit your target struct, and picks the best parse result via a scoring system.
Quick Start ¶
The simplest way to use sap is the generic Parse function:
type User struct {
Name string `json:"name"`
Age int `json:"age"`
Email string `json:"email"`
}
user, err := sap.Parse[User](llmResponse)
If you need to inspect how much coercion was required, use ParseWithScore. Lower scores indicate cleaner parses:
user, score, err := sap.ParseWithScore[User](llmResponse)
fmt.Printf("parse quality score: %d\n", score.Total())
To fix malformed JSON without parsing it into a struct, use FixJSON directly:
fixed, err := sap.FixJSON(`{name: 'Alice', age: 30,}`)
// fixed is now valid JSON: {"name": "Alice", "age": 30}
JSON Extraction ¶
When the input is not already valid JSON, the extractor runs through several strategies in order:
- Try parsing the trimmed input as standard JSON.
- Look for JSON inside markdown code blocks (```json ... ```).
- Scan the text for balanced { } and [ ] blocks.
- Attempt to fix any candidate that fails standard parsing by quoting unquoted keys, converting single quotes and backticks to double quotes, removing trailing commas, and stripping comments.
The best candidate is selected by parsing each one against your target type and comparing scores.
Type Coercion ¶
The type coercer transforms JSON values to fit your Go struct fields. Supported coercions include:
- String to int/float: "42" becomes 42, "$1,234.56" becomes 1234.56
- String to bool: "true", "yes", "1", "on" become true; "false", "no", "0", "off" become false
- Fractions: "1/5" becomes 0.2
- Currency: "$1,000" becomes 1000
- Number to bool: 0 is false, nonzero is true
- Bool to int: true is 1, false is 0
- Case-insensitive struct field matching
- Fuzzy enum matching with Unicode normalization
Each coercion adds a penalty to the parse score so you can distinguish a clean parse from one that required significant transformation.
Streaming ¶
For streaming LLM responses, ParsePartial accepts incomplete JSON and reports a CompletionState (Complete, Incomplete, or Pending):
user, state, err := sap.ParsePartial[User](partialResponse)
instructor-go Integration ¶
To use sap as the parser for instructor-go, create an InstructorParser:
parser := sap.NewInstructorParser()
client := instructor.FromOpenAI(openaiClient,
instructor.WithParser(parser),
)
The InstructorParser implements the instructor-go Parser interface, replacing its default JSON unmarshaling with the full sap extraction and coercion pipeline.
Index ¶
- Variables
- func CoerceToEnum(value interface{}, enumType reflect.Type, score *Score) (interface{}, error)
- func FixJSON(input string) (string, error)
- func NewParser() *sapParser
- func Parse[T any](input string) (T, error)
- type Coercer
- type CompletionState
- type EnumCoercer
- type Extractor
- type FixingParser
- type InstructorParser
- type JSONCandidate
- type ParseOptions
- type ParseResult
- type Parser
- type Score
- type ScoreFlag
- type StreamingOptions
- type TypeCoercer
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var DefaultParser = NewParser()
DefaultParser is the default SAP parser instance
Functions ¶
func CoerceToEnum ¶
CoerceToEnum attempts to coerce a value to an enum type
func FixJSON ¶
FixJSON attempts to fix malformed JSON
Example ¶
input := `{name: "Alice", age: 30, email: "alice@example.com",}`
fixed, err := FixJSON(input)
if err != nil {
panic(err)
}
// The fixed JSON can now be parsed by encoding/json
user, err := Parse[TestUser](fixed)
if err != nil {
panic(err)
}
fmt.Println(user.Name)
fmt.Println(user.Age)
Output: Alice 30
Example (Comments) ¶
input := `{
// Name of the user
"name": "Grace",
/* Age in years */ "age": 31,
"email": "grace@example.com"
}`
fixed, err := FixJSON(input)
if err != nil {
panic(err)
}
// Comments are removed, valid JSON remains
user, err := Parse[TestUser](fixed)
if err != nil {
panic(err)
}
fmt.Println(user.Name)
fmt.Println(user.Age)
Output: Grace 31
Example (SingleQuotes) ¶
input := `{'name': 'David', 'age': 40, 'email': 'david@example.com'}`
fixed, err := FixJSON(input)
if err != nil {
panic(err)
}
user, err := Parse[TestUser](fixed)
if err != nil {
panic(err)
}
fmt.Println(user.Name)
fmt.Println(user.Age)
Output: David 40
func NewParser ¶
func NewParser() *sapParser
NewParser creates a new SAP parser with default options
Example ¶
// Create a custom parser with strict mode enabled.
// Strict mode rejects malformed JSON instead of fixing it.
userType := reflect.TypeOf(TestUser{})
// Valid JSON works fine in strict mode
parser := NewParser().WithStrict(true)
validInput := `{"name": "Alice", "age": 30, "email": "alice@example.com"}`
_, err := parser.Parse(validInput, userType)
// Unquoted keys are rejected in strict mode
invalidInput := `{name: "Alice", age: 30}`
_, err2 := NewParser().WithStrict(true).Parse(invalidInput, userType)
fmt.Printf("valid input error: %v\n", err)
fmt.Printf("invalid input rejected: %v\n", err2 != nil)
Output: valid input error: <nil> invalid input rejected: true
func Parse ¶
Parse parses input text into the target type This is the main public API
Example ¶
input := `{"name": "Alice", "age": 30, "email": "alice@example.com"}`
user, err := Parse[TestUser](input)
if err != nil {
panic(err)
}
fmt.Println(user.Name)
fmt.Println(user.Age)
fmt.Println(user.Email)
Output: Alice 30 alice@example.com
Example (BooleanCoercion) ¶
// SAP coerces string "yes" to bool true
input := `{"title": "Dev", "experience": ["Go"], "active": "yes"}`
resume, err := Parse[TestResume](input)
if err != nil {
panic(err)
}
fmt.Println(resume.Title)
fmt.Println(resume.Active)
Output: Dev true
Example (ChainOfThought) ¶
input := `Let me extract the user info:
The user's name is Alice and they are 30 years old.
Here's the structured data:
{
"name": "Alice",
"age": 30,
"email": "alice@example.com"
}
Hope this helps!`
user, err := Parse[TestUser](input)
if err != nil {
panic(err)
}
fmt.Println(user.Name)
fmt.Println(user.Age)
fmt.Println(user.Email)
Output: Alice 30 alice@example.com
Example (ComplexTypes) ¶
input := `{"title": "Engineer", "experience": ["Go", "Rust", "Python"], "active": true}`
resume, err := Parse[TestResume](input)
if err != nil {
panic(err)
}
fmt.Println(resume.Title)
fmt.Println(len(resume.Experience))
fmt.Println(resume.Experience[0])
fmt.Println(resume.Active)
Output: Engineer 3 Go true
Example (Markdown) ¶
input := "Here's the user data:\n```json\n{\"name\": \"Bob\", \"age\": 25, \"email\": \"bob@example.com\"}\n```"
user, err := Parse[TestUser](input)
if err != nil {
panic(err)
}
fmt.Println(user.Name)
fmt.Println(user.Age)
Output: Bob 25
Example (RealWorldLLMOutput) ¶
// This is what LLM output actually looks like — chain-of-thought preamble,
// markdown code block, broken JSON with comments, mixed quotes, and values
// that need coercion. GSAP handles all of it in a single call.
input := "Sure! Here's the candidate information I extracted:\n\n" +
"```json\n" +
"{\n" +
" // Personal details\n" +
" name: 'Jane Doe',\n" +
" 'age': \"29 years\",\n" +
" email: 'jane.doe@example.com',\n" +
"\n" +
" /* Professional info */\n" +
" skills: \"go, python, react, typescript\",\n" +
" experience: \"8 years\",\n" +
" salary: \"$185K\",\n" +
" remote: \"yes\",\n" +
" start_date: \"2025-03-15\",\n" +
"\n" +
" // Optional fields\n" +
" website: \"N/A\",\n" +
" notes: \"TBD\",\n" +
"}\n" +
"```\n\n" +
"Let me know if you need anything else!"
candidate, err := Parse[TestCandidate](input)
if err != nil {
panic(err)
}
fmt.Println(candidate.Name)
fmt.Println(candidate.Age)
fmt.Println(candidate.Email)
fmt.Println(candidate.Skills)
fmt.Println(candidate.Experience)
fmt.Println(candidate.Salary)
fmt.Println(candidate.Remote)
fmt.Println(candidate.StartDate.Format("2006-01-02"))
fmt.Println(candidate.Website)
fmt.Println(candidate.Notes)
Output: Jane Doe 29 jane.doe@example.com [go python react typescript] 8 185000 true 2025-03-15 <nil> <nil>
Example (TypeCoercion) ¶
// SAP coerces string "30" to int 30 automatically
input := `{"name": "Alice", "age": "30", "email": "alice@example.com"}`
user, err := Parse[TestUser](input)
if err != nil {
panic(err)
}
fmt.Println(user.Name)
fmt.Println(user.Age)
Output: Alice 30
Types ¶
type Coercer ¶
type Coercer interface {
// Coerce transforms a value to match the target type
Coerce(value interface{}, targetType reflect.Type) (interface{}, *Score, error)
}
Coercer defines the interface for type coercion
type CompletionState ¶
type CompletionState int
CompletionState represents the parsing completion status
const ( // Complete means all required fields are present Complete CompletionState = iota // Incomplete means some required fields are missing (for streaming) Incomplete // Pending means parsing is ongoing Pending )
func ParsePartial ¶
func ParsePartial[T any](input string) (T, CompletionState, error)
ParsePartial parses input as a partial type (for streaming)
type Extractor ¶
type Extractor struct {
// contains filtered or unexported fields
}
Extractor handles JSON extraction from text
func NewExtractor ¶
func NewExtractor(opts *ParseOptions) *Extractor
NewExtractor creates a new JSON extractor
func (*Extractor) ExtractJSON ¶
func (e *Extractor) ExtractJSON(input string) ([]JSONCandidate, error)
ExtractJSON extracts potential JSON from text Returns candidates in order of likelihood
type FixingParser ¶
type FixingParser struct {
// contains filtered or unexported fields
}
FixingParser handles malformed JSON
type InstructorParser ¶
type InstructorParser struct {
// contains filtered or unexported fields
}
InstructorParser wraps SAP for use with instructor-go This allows you to use SAP as a custom parser for instructor-go
Example usage:
import (
"github.com/567-labs/instructor-go/pkg/instructor"
"github.com/alcarpenter/gsap"
)
// Create a SAP-based parser
parser := sap.NewInstructorParser()
// Use with instructor-go
client := instructor.FromOpenAI(openaiClient,
instructor.WithParser(parser),
)
func NewInstructorParser ¶
func NewInstructorParser() *InstructorParser
NewInstructorParser creates a new instructor-go compatible parser using SAP
Example ¶
// NewInstructorParser creates a parser compatible with instructor-go.
// Use it as a drop-in custom parser for instructor-go clients.
parser := NewInstructorParser()
// The parser implements Unmarshal([]byte, interface{}) error
var user TestUser
err := parser.Unmarshal(
[]byte(`{"name": "Alice", "age": "30", "email": "alice@example.com"}`),
&user,
)
if err != nil {
panic(err)
}
fmt.Println(user.Name)
fmt.Println(user.Age)
Output: Alice 30
func (*InstructorParser) Unmarshal ¶
func (ip *InstructorParser) Unmarshal(data []byte, v interface{}) error
Unmarshal implements the instructor-go Parser interface It takes the LLM response and parses it into the target type
func (*InstructorParser) WithIncompleteJSON ¶
func (ip *InstructorParser) WithIncompleteJSON(allow bool) *InstructorParser
WithIncompleteJSON allows incomplete JSON for streaming
func (*InstructorParser) WithStrict ¶
func (ip *InstructorParser) WithStrict(strict bool) *InstructorParser
WithStrict creates a new parser in strict mode
type JSONCandidate ¶
type JSONCandidate struct {
JSON string // The JSON string
Index int // Starting position in original text
}
JSONCandidate represents a potential JSON string extracted from text
type ParseOptions ¶
type ParseOptions struct {
Streaming StreamingOptions
Strict bool // If true, only accept exact JSON matches
}
ParseOptions configures parsing behavior
type ParseResult ¶
type ParseResult struct {
Value interface{}
Score *Score
CompletionState CompletionState
RemainingContent string // Text that wasn't part of JSON
}
ParseResult represents a successful parse
type Parser ¶
type Parser interface {
// Parse extracts potential JSON candidates from input text
Parse(input string) ([]JSONCandidate, error)
}
Parser defines the interface for extracting JSON from text
type Score ¶
type Score struct {
// contains filtered or unexported fields
}
Score represents the quality of a parse result Lower scores are better
func ParseWithScore ¶
ParseWithScore is like Parse but also returns the parse score
Example ¶
// ParseWithScore returns a quality score alongside the result.
// Lower scores indicate cleaner input (fewer fixes needed).
input := `{"name": "Frank", "age": "50", "email": "frank@example.com"}`
user, score, err := ParseWithScore[TestUser](input)
if err != nil {
panic(err)
}
fmt.Println(user.Name)
fmt.Println(user.Age)
fmt.Printf("score >= 0: %v\n", score.Total() >= 0)
Output: Frank 50 score >= 0: true
type ScoreFlag ¶
type ScoreFlag = string
ScoreFlag represents a type of coercion or transformation applied during parsing.
const ( FlagFloatToInt ScoreFlag = "FloatToInt" FlagStringToInt ScoreFlag = "StringToInt" FlagBoolToInt ScoreFlag = "BoolToInt" FlagStringToFloat ScoreFlag = "StringToFloat" FlagStringToBool ScoreFlag = "StringToBool" FlagNumberToBool ScoreFlag = "NumberToBool" FlagFuzzyFieldMatch ScoreFlag = "FuzzyFieldMatch" FlagEnumCaseInsensitive ScoreFlag = "EnumCaseInsensitive" FlagEnumFuzzyMatch ScoreFlag = "EnumFuzzyMatch" FlagStringToTime ScoreFlag = "StringToTime" FlagUnixToTime ScoreFlag = "UnixToTime" FlagMarkdownStripped ScoreFlag = "MarkdownStripped" FlagUnitStripped ScoreFlag = "UnitStripped" FlagMultiplierApplied ScoreFlag = "MultiplierApplied" FlagNullStringCoerced ScoreFlag = "NullStringCoerced" FlagCommaSplitToSlice ScoreFlag = "CommaSplitToSlice" FlagEmbeddedStruct ScoreFlag = "EmbeddedStruct" )
type StreamingOptions ¶
StreamingOptions configures streaming behavior
type TypeCoercer ¶
type TypeCoercer struct {
// contains filtered or unexported fields
}
TypeCoercer handles type coercion