parsers

package
v0.7.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 9, 2025 License: Apache-2.0 Imports: 8 Imported by: 0

README

parsers 输出解析系统

本模块是 goagent 框架的 LLM 输出解析库,提供灵活的输出格式解析能力,支持 JSON、ReAct、结构化文本等多种格式。

目录

架构设计

解析器体系
graph TB
    subgraph "解析器接口"
        Interface[OutputParser<T>]
    end

    subgraph "基础实现"
        Base[BaseOutputParser<T>]
    end

    subgraph "具体解析器"
        JSON[JSONOutputParser]
        Struct[StructuredOutputParser]
        List[ListOutputParser]
        Enum[EnumOutputParser]
        Bool[BooleanOutputParser]
        Regex[RegexOutputParser]
        ReAct[ReActOutputParser]
        Chain[ChainOutputParser]
    end

    Interface --> Base
    Base --> JSON
    Base --> Struct
    Base --> List
    Base --> Enum
    Base --> Bool
    Base --> Regex
    Base --> ReAct
    Base --> Chain

    style Interface fill:#e1f5ff
    style Chain fill:#e8f5e9
解析流程
sequenceDiagram
    participant LLM as LLM
    participant Parser as 解析器
    participant App as 应用

    LLM->>Parser: 原始文本输出
    Parser->>Parser: 提取目标格式
    Parser->>Parser: 反序列化/解析
    alt 解析成功
        Parser-->>App: 结构化数据
    else 解析失败
        Parser-->>App: 错误信息
    end

核心接口

OutputParser[T any]

泛型输出解析器接口:

type OutputParser[T any] interface {
    // 解析文本输出
    Parse(ctx context.Context, text string) (T, error)

    // 带提示的解析(用于错误恢复)
    ParseWithPrompt(ctx context.Context, text, prompt string) (T, error)

    // 获取格式化指令
    GetFormatInstructions() string

    // 获取输出类型描述
    GetType() string
}

解析器类型

1. JSONOutputParser[T any]

从 LLM 输出中智能提取和解析 JSON:

type JSONOutputParser[T any] struct {
    strict bool  // 严格模式
}

特性

  • 支持 markdown 代码块提取
  • 智能括号匹配
  • 严格/宽松模式切换
  • 大文本内存优化

提取策略

  1. 尝试提取 json ... 代码块
  2. 查找第一个 {[
  3. 严格模式:括号必须配对
  4. 宽松模式:从起始位置到文件末尾
// 创建严格模式解析器
parser := NewJSONOutputParser[MyStruct](true)

// 创建宽松模式解析器
parser := NewJSONOutputParser[MyStruct](false)
2. StructuredOutputParser[T any]

解析自定义格式的结构化文本:

type FieldSchema struct {
    Name        string // 字段名称
    Type        string // 字段类型
    Description string // 字段描述
    Required    bool   // 是否必需
}

支持的格式

  • field_name: value
  • field_name:value
  • **field_name**: value (markdown 加粗)
schema := map[string]FieldSchema{
    "task": {Name: "task", Type: "string", Required: true},
    "result": {Name: "result", Type: "string", Required: true},
}
parser := NewStructuredOutputParser[MyOutput](schema)
3. ListOutputParser

解析列表格式的输出:

// 创建换行分隔的列表解析器
parser := NewListOutputParser("\n")

// 创建逗号分隔的列表解析器
parser := NewListOutputParser(", ")

自动处理

  • 移除编号前缀:1., -, *,
  • 自动 trim 空行
4. EnumOutputParser

限制输出为预定义的枚举值:

parser := NewEnumOutputParser(
    []string{"yes", "no", "maybe"},
    false,  // 不区分大小写
)
5. BooleanOutputParser

解析是/否、真/假等布尔值:

parser := NewBooleanOutputParser()

支持的值

  • True: yes, true, y, 1, , , correct
  • False: no, false, n, 0, , , incorrect
6. RegexOutputParser

使用正则表达式提取多个字段:

parser := NewRegexOutputParser(map[string]string{
    "email": `([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})`,
    "phone": `(\d{3}-\d{3}-\d{4})`,
})
7. ChainOutputParser[T any]

尝试多个解析器直到成功(降级策略):

parser := NewChainOutputParser[MyStruct](
    NewJSONOutputParser[MyStruct](true),   // 首先尝试严格 JSON
    NewJSONOutputParser[MyStruct](false),  // 然后尝试宽松 JSON
    NewStructuredOutputParser[MyStruct](schema), // 最后尝试结构化文本
)

ReAct 解析器

ReActOutput 结构
type ReActOutput struct {
    FinalAnswer string                 // 最终答案
    Thought     string                 // 思考推理
    Action      string                 // 工具名称
    ActionInput map[string]interface{} // 工具输入
}
ReActOutputParser

专门解析 ReAct 模式的 LLM 输出:

parser := NewReActOutputParser()

支持的格式

Thought: <推理内容>
Action: <工具名称>
Action Input: <JSON格式输入>
Observation: <工具执行结果>
...循环...
Final Answer: <最终答案>

特殊方法

// 带重试的解析
output, err := parser.ParseWithRetry(ctx, text, 3)

// 验证解析结果
err := parser.Validate(output)

使用方法

基础使用
// 1. 创建解析器
parser := NewJSONOutputParser[MyStruct](false)

// 2. 获取格式指令(用于 prompt)
formatInstructions := parser.GetFormatInstructions()
prompt := "请返回 JSON 格式的结果。\n" + formatInstructions

// 3. 调用 LLM
output := callLLM(prompt)

// 4. 解析输出
ctx := context.Background()
result, err := parser.Parse(ctx, output)
if err != nil {
    // 使用 ParseWithPrompt 重试
    result, err = parser.ParseWithPrompt(ctx, output, prompt)
}
链式降级
// 多层降级:严格JSON → 宽松JSON → 结构化文本
parser := NewChainOutputParser[MyStruct](
    NewJSONOutputParser[MyStruct](true),
    NewJSONOutputParser[MyStruct](false),
    NewStructuredOutputParser[MyStruct](schema),
)

result, err := parser.Parse(ctx, llmOutput)
ReAct Agent 使用
parser := NewReActOutputParser()

// 获取格式指令
instructions := parser.GetFormatInstructions()

// 解析输出
output, err := parser.Parse(ctx, agentOutput)
if err != nil {
    output, err = parser.ParseWithRetry(ctx, agentOutput, 3)
}

// 验证结果
if err = parser.Validate(output); err != nil {
    // 处理验证错误
}

// 使用结果
if output.FinalAnswer != "" {
    // Agent 已完成
    finalResult := output.FinalAnswer
} else {
    // 继续迭代
    toolName := output.Action
    toolInput := output.ActionInput
}

API 参考

工厂函数
// 基础解析器
NewBaseOutputParser[T any]() *BaseOutputParser[T]

// JSON 解析器
NewJSONOutputParser[T any](strict bool) *JSONOutputParser[T]

// 结构化解析器
NewStructuredOutputParser[T any](schema map[string]FieldSchema) *StructuredOutputParser[T]

// 列表解析器
NewListOutputParser(separator string) *ListOutputParser

// 枚举解析器
NewEnumOutputParser(allowedValues []string, caseSensitive bool) *EnumOutputParser

// 布尔解析器
NewBooleanOutputParser() *BooleanOutputParser

// 正则解析器
NewRegexOutputParser(patterns map[string]string) *RegexOutputParser

// 链式解析器
NewChainOutputParser[T any](parsers ...OutputParser[T]) *ChainOutputParser[T]

// ReAct 解析器
NewReActOutputParser() *ReActOutputParser
通用方法
// 解析文本
Parse(ctx context.Context, text string) (T, error)

// 带提示解析
ParseWithPrompt(ctx context.Context, text, prompt string) (T, error)

// 获取格式指令
GetFormatInstructions() string

// 获取类型描述
GetType() string
ReAct 特有方法
// 带重试解析
ParseWithRetry(ctx context.Context, text string, maxRetries int) (*ReActOutput, error)

// 验证结果
Validate(parsed *ReActOutput) error
错误变量
var (
    ErrParseFailed    = errors.New("failed to parse output")
    ErrInvalidFormat  = errors.New("invalid output format")
    ErrMissingField   = errors.New("missing required field")
    ErrTypeConversion = errors.New("type conversion failed")
)

代码结构

parsers/
├── output_parser.go       # 核心解析器接口和实现
│   ├── OutputParser[T] 接口
│   ├── BaseOutputParser[T]
│   ├── JSONOutputParser[T]
│   ├── StructuredOutputParser[T]
│   ├── ListOutputParser
│   ├── EnumOutputParser
│   ├── BooleanOutputParser
│   ├── RegexOutputParser
│   └── ChainOutputParser[T]
├── parser_react.go        # ReAct 解析器
├── constants.go           # 常量定义
├── output_parser_test.go  # 单元测试
└── output_parser_bench_test.go # 性能测试

常量定义

ReAct 字段名称
const (
    FieldThought     = "thought"
    FieldAction      = "action"
    FieldActionInput = "action_input"
    FieldObservation = "observation"
    FieldFinalAnswer = "final_answer"
)
ReAct 标记符
const (
    MarkerThought     = "Thought:"
    MarkerAction      = "Action:"
    MarkerActionInput = "Action Input:"
    MarkerObservation = "Observation:"
    MarkerFinalAnswer = "Final Answer:"
)
输出格式类型
const (
    FormatJSON       = "json"
    FormatText       = "text"
    FormatMarkdown   = "markdown"
    FormatStructured = "structured"
)
解析模式
const (
    ModeStrict  = "strict"
    ModeLenient = "lenient"
    ModeAuto    = "auto"
)

性能优化

  • 内存效率:大文本中的小 JSON 自动 clone
  • 低分配:使用 strings.Index() 单次查找
  • 并发友好:所有解析器无状态

扩展阅读

  • llm - LLM 集成
  • agents - Agent 实现
  • core - 核心执行引擎

Documentation

Overview

Package parsers defines constants used for parsing agent outputs, particularly for ReAct (Reasoning and Acting) pattern and other reasoning frameworks.

Index

Constants

View Source
const (
	// FieldThought represents the reasoning/thinking step
	FieldThought = "thought"
	// FieldAction represents the action to take
	FieldAction = "action"
	// FieldActionInput represents the input parameters for the action
	FieldActionInput = "action_input"
	// FieldObservation represents the result of the action
	FieldObservation = "observation"
	// FieldFinalAnswer represents the final answer to the query
	FieldFinalAnswer = "final_answer"
	// FieldAnswer represents a general answer field
	FieldAnswer = "answer"
)

ReAct Pattern Field Names define the fields used in ReAct reasoning.

View Source
const (
	// MarkerThought is the prefix for thought sections
	MarkerThought = "Thought:"
	// MarkerAction is the prefix for action sections
	MarkerAction = "Action:"
	// MarkerActionInput is the prefix for action input sections
	MarkerActionInput = "Action Input:"
	// MarkerObservation is the prefix for observation sections
	MarkerObservation = "Observation:"
	// MarkerFinalAnswer is the prefix for final answer sections
	MarkerFinalAnswer = "Final Answer:"
)

ReAct Pattern Markers define the text markers used to identify ReAct components.

View Source
const (
	// MarkerQuestion represents a question marker
	MarkerQuestion = "Question:"
	// MarkerPlan represents a planning marker
	MarkerPlan = "Plan:"
	// MarkerStep represents a step marker
	MarkerStep = "Step:"
	// MarkerReasoning represents a reasoning marker
	MarkerReasoning = "Reasoning:"
	// MarkerConclusion represents a conclusion marker
	MarkerConclusion = "Conclusion:"
)

Alternative Pattern Markers provide variations commonly seen in outputs.

View Source
const (
	// FieldReasoning represents reasoning steps
	FieldReasoning = "reasoning"
	// FieldSteps represents sequential steps
	FieldSteps = "steps"
	// FieldConclusion represents the conclusion
	FieldConclusion = "conclusion"
	// FieldConfidence represents confidence score
	FieldConfidence = "confidence"
)

Chain-of-Thought (CoT) Pattern Constants

View Source
const (
	// FieldBranch represents a reasoning branch
	FieldBranch = "branch"
	// FieldScore represents a score/evaluation
	FieldScore = "score"
	// FieldPath represents a solution path
	FieldPath = "path"
	// FieldEvaluation represents an evaluation result
	FieldEvaluation = "evaluation"
)

Tree-of-Thought (ToT) Pattern Constants

View Source
const (
	// FieldCritique represents a self-critique
	FieldCritique = "critique"
	// FieldImprovement represents an improvement suggestion
	FieldImprovement = "improvement"
	// FieldRevision represents a revised output
	FieldRevision = "revision"
	// FieldFeedback represents feedback
	FieldFeedback = "feedback"
)

Self-Criticism Pattern Constants

View Source
const (
	// ErrTypeInvalidFormat indicates invalid format
	ErrTypeInvalidFormat = "invalid_format"
	// ErrTypeMissingField indicates a required field is missing
	ErrTypeMissingField = "missing_field"
	// ErrTypeInvalidJSON indicates invalid JSON
	ErrTypeInvalidJSON = "invalid_json"
	// ErrTypeUnexpectedStructure indicates unexpected structure
	ErrTypeUnexpectedStructure = "unexpected_structure"
)

Parsing Error Types

View Source
const (
	// FormatJSON represents JSON output format
	FormatJSON = "json"
	// FormatText represents plain text format
	FormatText = "text"
	// FormatMarkdown represents markdown format
	FormatMarkdown = "markdown"
	// FormatStructured represents structured format
	FormatStructured = "structured"
)

Output Format Types

View Source
const (
	// ModeStrict indicates strict parsing (fail on errors)
	ModeStrict = "strict"
	// ModeLenient indicates lenient parsing (best effort)
	ModeLenient = "lenient"
	// ModeAuto indicates automatic mode detection
	ModeAuto = "auto"
)

Parsing Modes

Variables

View Source
var (
	ErrParseFailed    = errors.New("failed to parse output")
	ErrInvalidFormat  = errors.New("invalid output format")
	ErrMissingField   = errors.New("missing required field")
	ErrTypeConversion = errors.New("type conversion failed")
)

Functions

This section is empty.

Types

type BaseOutputParser

type BaseOutputParser[T any] struct {
	// contains filtered or unexported fields
}

BaseOutputParser 提供 OutputParser 的基础实现

func NewBaseOutputParser

func NewBaseOutputParser[T any]() *BaseOutputParser[T]

NewBaseOutputParser 创建基础输出解析器

func (*BaseOutputParser[T]) GetFormatInstructions

func (p *BaseOutputParser[T]) GetFormatInstructions() string

GetFormatInstructions 需要由子类实现

func (*BaseOutputParser[T]) GetType

func (p *BaseOutputParser[T]) GetType() string

GetType 获取类型名称

func (*BaseOutputParser[T]) Parse

func (p *BaseOutputParser[T]) Parse(ctx context.Context, text string) (T, error)

Parse 需要由子类实现

func (*BaseOutputParser[T]) ParseWithPrompt

func (p *BaseOutputParser[T]) ParseWithPrompt(ctx context.Context, text, prompt string) (T, error)

ParseWithPrompt 默认实现:忽略 prompt,直接调用 Parse

type BooleanOutputParser

type BooleanOutputParser struct {
	*BaseOutputParser[bool]
	// contains filtered or unexported fields
}

BooleanOutputParser 布尔输出解析器

解析是/否类型的输出

func NewBooleanOutputParser

func NewBooleanOutputParser() *BooleanOutputParser

NewBooleanOutputParser 创建布尔输出解析器

func (*BooleanOutputParser) GetFormatInstructions

func (p *BooleanOutputParser) GetFormatInstructions() string

GetFormatInstructions 获取格式化指令

func (*BooleanOutputParser) Parse

func (p *BooleanOutputParser) Parse(ctx context.Context, text string) (bool, error)

Parse 解析布尔输出

type ChainOutputParser

type ChainOutputParser[T any] struct {
	// contains filtered or unexported fields
}

ChainOutputParser 链式输出解析器

尝试多个解析器,使用第一个成功的

func NewChainOutputParser

func NewChainOutputParser[T any](parsers ...OutputParser[T]) *ChainOutputParser[T]

NewChainOutputParser 创建链式输出解析器

func (*ChainOutputParser[T]) GetFormatInstructions

func (p *ChainOutputParser[T]) GetFormatInstructions() string

GetFormatInstructions 获取格式化指令

func (*ChainOutputParser[T]) GetType

func (p *ChainOutputParser[T]) GetType() string

GetType 获取类型

func (*ChainOutputParser[T]) Parse

func (p *ChainOutputParser[T]) Parse(ctx context.Context, text string) (T, error)

Parse 尝试所有解析器

func (*ChainOutputParser[T]) ParseWithPrompt

func (p *ChainOutputParser[T]) ParseWithPrompt(ctx context.Context, text, prompt string) (T, error)

ParseWithPrompt 带提示的解析

type EnumOutputParser

type EnumOutputParser struct {
	*BaseOutputParser[string]
	// contains filtered or unexported fields
}

EnumOutputParser 枚举输出解析器

限制输出必须是预定义的枚举值之一

func NewEnumOutputParser

func NewEnumOutputParser(allowedValues []string, caseSensitive bool) *EnumOutputParser

NewEnumOutputParser 创建枚举输出解析器

func (*EnumOutputParser) GetFormatInstructions

func (p *EnumOutputParser) GetFormatInstructions() string

GetFormatInstructions 获取格式化指令

func (*EnumOutputParser) Parse

func (p *EnumOutputParser) Parse(ctx context.Context, text string) (string, error)

Parse 解析枚举输出

type FieldSchema

type FieldSchema struct {
	Name        string // 字段名称
	Type        string // 字段类型
	Description string // 字段描述
	Required    bool   // 是否必需
}

FieldSchema 字段模式定义

type JSONOutputParser

type JSONOutputParser[T any] struct {
	*BaseOutputParser[T]
	// contains filtered or unexported fields
}

JSONOutputParser JSON 输出解析器

解析 LLM 输出中的 JSON 内容

func NewJSONOutputParser

func NewJSONOutputParser[T any](strict bool) *JSONOutputParser[T]

NewJSONOutputParser 创建 JSON 输出解析器

func (*JSONOutputParser[T]) GetFormatInstructions

func (p *JSONOutputParser[T]) GetFormatInstructions() string

GetFormatInstructions 获取格式化指令

func (*JSONOutputParser[T]) Parse

func (p *JSONOutputParser[T]) Parse(ctx context.Context, text string) (T, error)

Parse 解析 JSON 输出

type ListOutputParser

type ListOutputParser struct {
	*BaseOutputParser[[]string]
	// contains filtered or unexported fields
}

ListOutputParser 列表输出解析器

解析列表格式的输出(如逗号分隔、换行分隔等)

func NewListOutputParser

func NewListOutputParser(separator string) *ListOutputParser

NewListOutputParser 创建列表输出解析器

func (*ListOutputParser) GetFormatInstructions

func (p *ListOutputParser) GetFormatInstructions() string

GetFormatInstructions 获取格式化指令

func (*ListOutputParser) Parse

func (p *ListOutputParser) Parse(ctx context.Context, text string) ([]string, error)

Parse 解析列表输出

type OutputParser

type OutputParser[T any] interface {
	// Parse 解析文本输出为结构化数据
	Parse(ctx context.Context, text string) (T, error)

	// ParseWithPrompt 带提示信息的解析(用于错误恢复)
	ParseWithPrompt(ctx context.Context, text, prompt string) (T, error)

	// GetFormatInstructions 获取格式化指令
	// 这些指令会添加到 prompt 中,告诉 LLM 如何格式化输出
	GetFormatInstructions() string

	// GetType 获取输出类型描述
	GetType() string
}

OutputParser 定义输出解析器接口

借鉴 LangChain 的 OutputParser 设计,提供结构化的 LLM 输出解析 泛型参数 T 指定输出类型

type ReActOutput

type ReActOutput struct {
	FinalAnswer string                 `json:"final_answer,omitempty"`
	Thought     string                 `json:"thought,omitempty"`
	Action      string                 `json:"action,omitempty"`
	ActionInput map[string]interface{} `json:"action_input,omitempty"`
}

ReActOutput ReAct 解析器的输出结构

type ReActOutputParser

type ReActOutputParser struct {
	*BaseOutputParser[*ReActOutput]
	// contains filtered or unexported fields
}

ReActOutputParser 解析 ReAct Agent 的输出

解析格式:

Thought: <思考内容>
Action: <工具名称>
Action Input: <JSON 格式的输入>
或
Final Answer: <最终答案>

func NewReActOutputParser

func NewReActOutputParser() *ReActOutputParser

NewReActOutputParser 创建 ReAct 输出解析器

func (*ReActOutputParser) GetFormatInstructions

func (p *ReActOutputParser) GetFormatInstructions() string

GetFormatInstructions 返回格式说明

func (*ReActOutputParser) Parse

func (p *ReActOutputParser) Parse(ctx context.Context, text string) (*ReActOutput, error)

Parse 解析 ReAct 输出

func (*ReActOutputParser) ParseWithRetry

func (p *ReActOutputParser) ParseWithRetry(ctx context.Context, text string, maxRetries int) (*ReActOutput, error)

ParseWithRetry 带重试的解析

func (*ReActOutputParser) Validate

func (p *ReActOutputParser) Validate(parsed *ReActOutput) error

Validate 验证解析结果

type RegexOutputParser

type RegexOutputParser struct {
	*BaseOutputParser[map[string]string]
	// contains filtered or unexported fields
}

RegexOutputParser 正则表达式输出解析器

使用正则表达式提取输出

func NewRegexOutputParser

func NewRegexOutputParser(patterns map[string]string) *RegexOutputParser

NewRegexOutputParser 创建正则表达式输出解析器

func (*RegexOutputParser) GetFormatInstructions

func (p *RegexOutputParser) GetFormatInstructions() string

GetFormatInstructions 获取格式化指令

func (*RegexOutputParser) Parse

func (p *RegexOutputParser) Parse(ctx context.Context, text string) (map[string]string, error)

Parse 解析输出

type StructuredOutputParser

type StructuredOutputParser[T any] struct {
	*BaseOutputParser[T]
	// contains filtered or unexported fields
}

StructuredOutputParser 结构化输出解析器

支持自定义字段的结构化解析

func NewStructuredOutputParser

func NewStructuredOutputParser[T any](schema map[string]FieldSchema) *StructuredOutputParser[T]

NewStructuredOutputParser 创建结构化输出解析器

func (*StructuredOutputParser[T]) GetFormatInstructions

func (p *StructuredOutputParser[T]) GetFormatInstructions() string

GetFormatInstructions 获取格式化指令

func (*StructuredOutputParser[T]) Parse

func (p *StructuredOutputParser[T]) Parse(ctx context.Context, text string) (T, error)

Parse 解析结构化输出

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL