parser

package

v1.9.3 Latest Latest Go to latest Published: Mar 8, 2026 License: Apache-2.0 Imports: 11 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ajitpratap0/GoSQLX

Links

Open Source Insights

README ¶

SQL Parser Package

Overview

The parser package provides a production-ready, recursive descent SQL parser that converts tokenized SQL into an Abstract Syntax Tree (AST). It supports comprehensive SQL features across multiple dialects with ~80-85% SQL-99 compliance.

Key Features

DML Operations: SELECT, INSERT, UPDATE, DELETE with full clause support
DDL Operations: CREATE TABLE, ALTER TABLE, DROP TABLE, CREATE INDEX
Advanced SQL: CTEs (WITH), set operations (UNION/EXCEPT/INTERSECT), window functions
JOINs: All types (INNER, LEFT, RIGHT, FULL, CROSS, NATURAL) with proper left-associative parsing
Window Functions: PARTITION BY, ORDER BY, frame clauses (ROWS/RANGE)
SQL-99 F851: NULLS FIRST/LAST support in ORDER BY clauses
Object Pooling: Memory-efficient parser instance reuse
Context Support: Cancellation and timeout handling

Usage

Basic Parsing

package main

import (
    "github.com/ajitpratap0/GoSQLX/pkg/sql/parser"
    "github.com/ajitpratap0/GoSQLX/pkg/sql/token"
)

func main() {
    // Create parser from pool
    p := parser.NewParser()
    defer p.Release()  // ALWAYS release back to pool

    // Parse tokens into AST
    tokens := []token.Token{ /* your tokens */ }
    astNode, err := p.Parse(tokens)
    if err != nil {
        // Handle parsing error
    }

    // Work with AST
    // ...
}

Context-Aware Parsing

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

p := parser.NewParser()
defer p.Release()

astNode, err := p.ParseContext(ctx, tokens)
if err != nil {
    if ctx.Err() != nil {
        // Handle timeout/cancellation
    }
    // Handle parse error
}

Architecture

Core Components

parser.go (1,628 lines): Main parser with all parsing logic
alter.go (368 lines): DDL ALTER statement parsing
token_conversion.go (~200 lines): Internal token conversion (unexported)

Parsing Flow

Tokens → Parse() → parseStatement() → Specific statement parser → AST Node

Recursion Protection

Maximum recursion depth: 100 levels

Protects against:

Deeply nested CTEs
Excessive subquery nesting
Stack overflow attacks

Supported SQL Features

Phase 1 (v1.0.0) - Core DML

SELECT with FROM, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, OFFSET
All JOIN types with proper precedence
INSERT (single/multi-row)
UPDATE with SET and WHERE
DELETE with WHERE

Phase 2 (v1.2.0) - Advanced Features

Common Table Expressions (WITH clause)
Recursive CTEs with depth protection
Set operations: UNION [ALL], EXCEPT, INTERSECT
CTE column specifications

Phase 2.5 (v1.3.0) - Window Functions

Ranking: ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE()
Analytic: LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()
PARTITION BY and ORDER BY
Frame clauses: ROWS/RANGE with bounds

Phase 2.6 (v1.5.0) - NULL Ordering

NULLS FIRST/LAST in ORDER BY
NULLS FIRST/LAST in window ORDER BY
Database portability for NULL ordering

Performance Characteristics

Throughput: 1.5M operations/second (peak), 1.38M sustained
Memory: Object pooling provides 60-80% reduction vs. new instances
Latency: <1μs for complex queries with window functions
Thread Safety: All pool operations are race-free

Error Handling

astNode, err := p.Parse(tokens)
if err != nil {
    if parseErr, ok := err.(*parser.ParseError); ok {
        fmt.Printf("Parse error at token '%s': %s\n",
            parseErr.Token.Literal, parseErr.Message)
    }
}

Testing

Run parser tests:

# All tests
go test -v ./pkg/sql/parser/

# With race detection
go test -race ./pkg/sql/parser/

# Specific features
go test -v -run TestParser_.*Window ./pkg/sql/parser/
go test -v -run TestParser_.*CTE ./pkg/sql/parser/
go test -v -run TestParser_.*Join ./pkg/sql/parser/

# Performance benchmarks
go test -bench=BenchmarkParser -benchmem ./pkg/sql/parser/

Best Practices

1. Always Use Defer

p := parser.NewParser()
defer p.Release()  // Ensures cleanup even on panic

2. Don't Store Pooled Instances

// BAD: Storing pooled object
type MyStruct struct {
    parser *Parser  // DON'T DO THIS
}

// GOOD: Get from pool when needed
func ParseSQL(tokens []token.Token) (*ast.AST, error) {
    p := parser.NewParser()
    defer p.Release()
    return p.Parse(tokens)
}

3. Use Context for Long Operations

ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()

p := parser.NewParser()
defer p.Release()

astNode, err := p.ParseContext(ctx, tokens)

Common Pitfalls

❌ Forgetting to Release

// BAD: Memory leak
p := parser.NewParser()
astNode, _ := p.Parse(tokens)
// p never returned to pool

✅ Correct Pattern

// GOOD: Automatic cleanup
p := parser.NewParser()
defer p.Release()
astNode, err := p.Parse(tokens)

tokenizer: Converts SQL text to tokens (input to parser)
ast: AST node definitions (output from parser)
token: Token type definitions
keywords: SQL keyword classification

Documentation

Version History

v1.5.0: NULLS FIRST/LAST support (SQL-99 F851)
v1.4.0: Production validation complete
v1.3.0: Window functions (Phase 2.5)
v1.2.0: CTEs and set operations (Phase 2)
v1.0.0: Core DML and JOINs (Phase 1)

Documentation ¶

Overview ¶

Package parser provides a high-performance, production-ready recursive descent SQL parser that converts tokenized SQL into a comprehensive Abstract Syntax Tree (AST).

The primary entry points are GetParser (pool-based instantiation), ParseFromModelTokens (converts []models.TokenWithSpan from the tokenizer into parser tokens), and ParseWithPositions (produces an *ast.AST with full position information). For dialect-aware parsing, use ParseWithDialect. For concurrent use, always obtain a parser instance via GetParser and return it with PutParser (or defer parser.PutParser(p)).

Overview ¶

The parser implements a predictive recursive descent parser with one-token lookahead, supporting comprehensive SQL features across multiple database dialects including PostgreSQL, MySQL, SQL Server, Oracle, and SQLite. It achieves enterprise-grade performance with 1.38M+ operations/second sustained throughput and 347ns average latency for complex queries.

Architecture ¶

The parser follows a modular architecture with specialized parsing functions for each SQL construct:

parser.go: Main parser entry point, statement routing, and core token management
select.go: SELECT statement parsing including DISTINCT ON, FETCH, and table operations
dml.go: Data Manipulation Language (INSERT, UPDATE, DELETE, MERGE statements)
ddl.go: Data Definition Language (CREATE, ALTER, DROP, TRUNCATE statements)
expressions.go: Expression parsing with operator precedence and JSON operators
window.go: Window function parsing (OVER clause, PARTITION BY, ORDER BY, frame specs)
cte.go: Common Table Expression parsing with recursive CTE support
grouping.go: GROUPING SETS, ROLLUP, CUBE parsing (SQL-99 T431)
alter.go: ALTER TABLE statement parsing

Parsing Flow ¶

The typical parsing flow involves three stages:

Token Conversion: Convert tokenizer output to parser tokens tokens := tokenizer.Tokenize(sqlBytes) result := parser.ParseFromModelTokens(tokens)
AST Generation: Parse tokens into Abstract Syntax Tree parser := parser.GetParser() defer parser.PutParser(parser) ast, err := parser.ParseWithPositions(result)
AST Processing: Traverse and analyze the generated AST visitor.Walk(ast, myVisitor)

Token Management ¶

The parser uses Type-based token matching for optimal performance. Type is an integer enumeration that enables O(1) switch-based dispatch instead of O(n) string comparisons. This optimization provides ~14x performance improvement on hot paths (0.24ns vs 3.4ns per comparison).

Fast path example:

if p.currentToken.Type == models.TokenTypeSelect {
    // O(1) integer comparison
    return p.parseSelectWithSetOperations()
}

The parser maintains backward compatibility with legacy token matching for tests and legacy code that creates tokens without Type.

Performance Optimizations ¶

The parser implements several performance optimizations:

Object Pooling: All major data structures use sync.Pool for zero-allocation reuse
Fast Token Dispatch: O(1) Type switch instead of O(n) string comparisons
Pre-allocation: Statement slices pre-allocated based on input size estimation
Zero-copy Operations: Direct token access without string allocation
Recursion Depth Limiting: MaxRecursionDepth prevents stack overflow (DoS protection)

DoS Protection ¶

The parser includes protection against denial-of-service attacks via deeply nested expressions:

const MaxRecursionDepth = 100  // Prevents stack overflow

Expressions deeper than this limit return a RecursionDepthLimitError, preventing both stack exhaustion and excessive parsing time on malicious input.

Error Handling ¶

The parser provides structured error handling with precise position information:

Syntax errors include line/column location from the tokenizer
Error messages preserve SQL context for debugging
Errors use the pkg/errors package with error codes for categorization
ParseWithPositions() enables enhanced error reporting with source positions

Example error:

error: expected 'FROM' but got 'WHERE' at line 1, column 15

SQL Feature Support (v1.6.0) ¶

Core DML Operations ¶

SELECT: Full SELECT support with DISTINCT, DISTINCT ON, aliases, subqueries
INSERT: INSERT INTO with VALUES, column lists, RETURNING clause
UPDATE: UPDATE with SET clauses, WHERE conditions, RETURNING clause
DELETE: DELETE FROM with WHERE conditions, RETURNING clause
MERGE: SQL:2003 MERGE statements with MATCHED/NOT MATCHED clauses

DDL Operations ¶

CREATE TABLE: Tables with constraints, partitioning, column definitions
CREATE VIEW: Views with OR REPLACE, TEMPORARY, IF NOT EXISTS
CREATE MATERIALIZED VIEW: Materialized views with WITH [NO] DATA
CREATE INDEX: Indexes with UNIQUE, USING, partial indexes (WHERE clause)
ALTER TABLE: ADD/DROP COLUMN, ADD/DROP CONSTRAINT, RENAME operations
DROP: Drop tables, views, materialized views, indexes with CASCADE/RESTRICT
TRUNCATE: TRUNCATE TABLE with RESTART/CONTINUE IDENTITY, CASCADE/RESTRICT
REFRESH MATERIALIZED VIEW: With CONCURRENTLY and WITH [NO] DATA options

Advanced SELECT Features ¶

JOINs: INNER, LEFT, RIGHT, FULL, CROSS, NATURAL joins with ON/USING
LATERAL JOIN: PostgreSQL correlated subqueries in FROM clause
Subqueries: Scalar, EXISTS, IN, ANY, ALL subqueries
CTEs: WITH clause, recursive CTEs, multiple CTE definitions
Set Operations: UNION, UNION ALL, EXCEPT, INTERSECT with proper associativity
DISTINCT ON: PostgreSQL-specific row selection by expression
Window Functions: OVER clause with PARTITION BY, ORDER BY, frame specs
GROUPING SETS: GROUPING SETS, ROLLUP, CUBE (SQL-99 T431)
ORDER BY: With NULLS FIRST/LAST (SQL-99 F851)
LIMIT/OFFSET: Standard pagination with ROW/ROWS variants
FETCH FIRST/NEXT: SQL-99 FETCH clause with PERCENT, ONLY, WITH TIES

PostgreSQL Extensions (v1.6.0) ¶

LATERAL JOIN: Correlated lateral subqueries in FROM/JOIN clauses
JSON/JSONB Operators: All 10 operators (->/->>/#>/#>>/@>/<@/?/?|/?&/#-)
DISTINCT ON: Row deduplication by expression with ORDER BY
FILTER Clause: Conditional aggregation (SQL:2003 T612)
RETURNING Clause: Return modified rows from INSERT/UPDATE/DELETE
Aggregate ORDER BY: ORDER BY inside STRING_AGG, ARRAY_AGG functions
Materialized CTE Hints: AS [NOT] MATERIALIZED in CTE definitions

Expression Support ¶

The parser handles comprehensive expression types with correct operator precedence:

Logical: AND, OR, NOT with proper precedence (OR < AND < comparison)
Comparison: =, <, >, !=, <=, >=, <> with type-safe evaluation
Arithmetic: +, -, *, /, % with standard precedence (* > +)
String: || (concatenation) with proper precedence
JSON: ->, ->>, #>, #>>, @>, <@, ?, ?|, ?&, #- (PostgreSQL)
Pattern Matching: LIKE, ILIKE, NOT LIKE with escape sequences
Range: BETWEEN, NOT BETWEEN with inclusive bounds
Set Membership: IN, NOT IN with value lists or subqueries
NULL Testing: IS NULL, IS NOT NULL with three-valued logic
Quantifiers: ANY, ALL with comparison operators
Existence: EXISTS, NOT EXISTS with subquery evaluation
CASE: Both simple and searched CASE expressions
CAST: Type conversion with CAST(expr AS type)
Function Calls: Regular functions and aggregate functions

Window Functions (SQL-99) ¶

Complete support for SQL-99 window functions with OVER clause:

Ranking: ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(n)
Offset: LAG(expr, offset, default), LEAD(expr, offset, default)
Value: FIRST_VALUE(expr), LAST_VALUE(expr), NTH_VALUE(expr, n)
PARTITION BY: Partition data into groups for window computation
ORDER BY: Order rows within each partition
Frame Clause: ROWS/RANGE with PRECEDING/FOLLOWING/CURRENT ROW
Frame Bounds: UNBOUNDED PRECEDING, n PRECEDING, CURRENT ROW, n FOLLOWING, UNBOUNDED FOLLOWING

Example window function query:

SELECT
    dept,
    name,
    salary,
    ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) as rank,
    LAG(salary, 1) OVER (ORDER BY hire_date) as prev_salary,
    SUM(salary) OVER (ORDER BY hire_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as rolling_sum
FROM employees;

Context and Cancellation ¶

The parser supports context-based cancellation for long-running operations:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
ast, err := parser.ParseContext(ctx, tokens)
if err == context.DeadlineExceeded {
    // Handle timeout
}

The parser checks context.Err() at strategic points (statement boundaries, expression starts) to enable fast cancellation without excessive overhead.

Thread Safety ¶

The parser is designed for concurrent use with proper object pooling:

GetParser()/PutParser(): Thread-safe parser pooling via sync.Pool
Zero race conditions: Validated via comprehensive race detection tests
Per-goroutine instances: Each goroutine gets its own parser from pool
No shared state: Parser instances maintain no shared mutable state

Memory Management ¶

Critical: Always use defer with pool return functions to prevent resource leaks:

parser := parser.GetParser()
defer parser.PutParser(parser)  // MANDATORY - prevents memory leaks

The parser integrates with the AST object pool:

astObj := ast.NewAST()
defer ast.ReleaseAST(astObj)  // MANDATORY - returns to pool

Object pooling provides 60-80% memory reduction in production workloads with 95%+ pool hit rates.

Usage Examples ¶

Basic parsing with position tracking:

import (
    "github.com/ajitpratap0/GoSQLX/pkg/sql/parser"
    "github.com/ajitpratap0/GoSQLX/pkg/sql/tokenizer"
)

// Tokenize SQL
tkz := tokenizer.GetTokenizer()
defer tokenizer.PutTokenizer(tkz)
tokens, err := tkz.Tokenize([]byte("SELECT * FROM users WHERE active = true"))
if err != nil {
    // Handle tokenization error
}

// Convert tokens
result := parser.ParseFromModelTokens(tokens)

// Parse to AST
p := parser.GetParser()
defer parser.PutParser(p)
astObj, err := p.ParseWithPositions(result)
defer ast.ReleaseAST(astObj)
if err != nil {
    // Handle parsing error with line/column information
}

// Access parsed statements
for _, stmt := range astObj.Statements {
    // Process each statement
}

Parsing with timeout:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

p := parser.GetParser()
defer parser.PutParser(p)

astObj, err := p.ParseContext(ctx, tokens)
defer ast.ReleaseAST(astObj)
if err != nil {
    if errors.Is(err, context.DeadlineExceeded) {
        log.Println("Parsing timeout exceeded")
    }
    // Handle other errors
}

Performance Characteristics ¶

Measured performance on production workloads (v1.6.0):

Throughput: 1.38M+ operations/second sustained, 1.5M peak
Latency: 347ns average for complex queries with window functions
Token Processing: 8M tokens/second
Memory Efficiency: 60-80% reduction via object pooling
Allocation Rate: <100 bytes/op for pooled parsing
Cache Efficiency: 95%+ pool hit rate in production

SQL Compliance ¶

The parser provides approximately 80-85% SQL-99 compliance:

Core SQL-99: Full support for basic SELECT, INSERT, UPDATE, DELETE
SQL-99 Features: Window functions (F611), CTEs (T121), set operations
SQL:2003 Features: MERGE statements (F312), XML/JSON operators
SQL:2008 Features: TRUNCATE TABLE, enhanced grouping operations
Vendor Extensions: PostgreSQL, MySQL, SQL Server, Oracle specific syntax

Empty Statement Handling ¶

By default, the parser silently ignores empty statements between semicolons. For example, ";;; SELECT 1 ;;;" is treated as a single "SELECT 1" statement. This lenient behavior matches common SQL client behavior where extra semicolons are harmless.

To reject empty statements, enable strict mode:

p := parser.NewParser(parser.WithStrictMode())
// or
p := parser.GetParser()
p.ApplyOptions(parser.WithStrictMode())

In strict mode, empty statements (consecutive semicolons or leading/trailing semicolons with no SQL) return an error.

Limitations ¶

Current limitations (will be addressed in future releases):

Stored procedures: CREATE PROCEDURE/FUNCTION not yet supported
Triggers: CREATE TRIGGER parsing not implemented
Some vendor-specific extensions may require additional work

Related Packages ¶

github.com/ajitpratap0/GoSQLX/pkg/sql/tokenizer: Token generation from SQL text
github.com/ajitpratap0/GoSQLX/pkg/sql/ast: AST node definitions and visitor pattern
github.com/ajitpratap0/GoSQLX/pkg/models: Token types, spans, locations
github.com/ajitpratap0/GoSQLX/pkg/errors: Structured error types with codes
github.com/ajitpratap0/GoSQLX/pkg/sql/keywords: Multi-dialect keyword classification

Index ¶

Constants
Variables
func ParseBytes(input []byte) (*ast.AST, error)
func ParseBytesWithDialect(input []byte, dialect keywords.SQLDialect) (*ast.AST, error)
func ParseBytesWithTokens(input []byte) (*ast.AST, []models.TokenWithSpan, error)
func ParseWithDialect(sql string, dialect keywords.SQLDialect) (*ast.AST, error)
func PutParser(p *Parser)
func Validate(sql string) error
func ValidateBytes(input []byte) error
func ValidateBytesWithDialect(input []byte, dialect keywords.SQLDialect) error
func ValidateWithDialect(sql string, dialect keywords.SQLDialect) error
type ConversionResult
type ParseError
- func (e *ParseError) Error() string
- func (e *ParseError) Unwrap() error
type Parser
- func GetParser() *Parser
- func NewParser(opts ...ParserOption) *Parser
- func (p *Parser) ApplyOptions(opts ...ParserOption)
- func (p *Parser) Dialect() string
- func (p *Parser) Parse(tokens []token.Token) (*ast.AST, error)
- func (p *Parser) ParseContext(ctx context.Context, tokens []token.Token) (*ast.AST, error)
- func (p *Parser) ParseContextFromModelTokens(ctx context.Context, tokens []models.TokenWithSpan) (*ast.AST, error)
- func (p *Parser) ParseFromModelTokens(tokens []models.TokenWithSpan) (*ast.AST, error)
- func (p *Parser) ParseFromModelTokensWithPositions(tokens []models.TokenWithSpan) (*ast.AST, error)
- func (p *Parser) ParseWithPositions(result *ConversionResult) (*ast.AST, error)
- func (p *Parser) ParseWithRecovery(tokens []token.Token) ([]ast.Statement, []error)
- func (p *Parser) ParseWithRecoveryFromModelTokens(tokens []models.TokenWithSpan) ([]ast.Statement, []error)
- func (p *Parser) Release()
- func (p *Parser) Reset()
type ParserOption
- func WithDialect(dialect string) ParserOption
- func WithStrictMode() ParserOption
type RecoveryResult
- func ParseMultiWithRecovery(tokens []token.Token) *RecoveryResult
- func (r *RecoveryResult) Release()
type TokenPosition

Constants ¶

View Source

const MaxRecursionDepth = 100

MaxRecursionDepth defines the maximum allowed recursion depth for parsing operations. This prevents stack overflow from deeply nested expressions, CTEs, or other recursive structures.

DoS Protection: This limit protects against denial-of-service attacks via malicious SQL with deeply nested expressions like: (((((...((value))...)))))

Typical Values:

MaxRecursionDepth = 100: Protects against stack exhaustion
Legitimate queries rarely exceed depth of 10-15
Malicious queries can reach thousands without this limit

Error: Exceeding this depth returns goerrors.RecursionDepthLimitError

Variables ¶

View Source

var (
	// ErrUnexpectedStatement indicates a statement type was not expected in context.
	ErrUnexpectedStatement = errors.New("unexpected statement type")
)

Sentinel errors for the parser package.

Functions ¶

func ParseBytes ¶ added in v1.9.3

func ParseBytes(input []byte) (*ast.AST, error)

ParseBytes parses SQL from a []byte input without requiring a string conversion. This is especially useful when reading SQL from files via os.ReadFile. See issue #277.

func ParseBytesWithDialect ¶ added in v1.9.3

func ParseBytesWithDialect(input []byte, dialect keywords.SQLDialect) (*ast.AST, error)

ParseBytesWithDialect is like ParseWithDialect but accepts []byte.

func ParseBytesWithTokens ¶ added in v1.9.3

func ParseBytesWithTokens(input []byte) (*ast.AST, []models.TokenWithSpan, error)

ParseBytesWithTokens is like ParseBytes but also returns the preprocessed token slice ([]models.TokenWithSpan) for callers that need both.

func ParseWithDialect ¶ added in v1.9.3

func ParseWithDialect(sql string, dialect keywords.SQLDialect) (*ast.AST, error)

ParseWithDialect parses SQL using the specified dialect for keyword recognition. This is a convenience function combining dialect-aware tokenization and parsing.

func PutParser ¶ added in v1.6.0

func PutParser(p *Parser)

PutParser returns a Parser instance to the pool after resetting it. This MUST be called after parsing is complete to enable reuse and prevent memory leaks.

The parser is automatically reset before being returned to the pool, clearing all internal state (tokens, position, depth, context, position mappings).

Performance: O(1), <30ns typical latency

Usage:

parser := parser.GetParser()
defer parser.PutParser(parser)  // Use defer to ensure cleanup on error paths

Thread Safety: Safe for concurrent calls - operates on independent parser instances.

func Validate ¶ added in v1.9.3

func Validate(sql string) error

Validate checks whether the given SQL string is syntactically valid without building a full AST. It tokenizes the input and runs the parser, but the returned AST is immediately released. This is significantly faster than Parse() when you only need to know if the SQL is valid.

func ValidateBytes ¶ added in v1.9.3

func ValidateBytes(input []byte) error

ValidateBytes is like Validate but accepts []byte to avoid a string copy.

func ValidateBytesWithDialect ¶ added in v1.9.3

func ValidateBytesWithDialect(input []byte, dialect keywords.SQLDialect) error

ValidateBytesWithDialect is like ValidateWithDialect but accepts []byte.

func ValidateWithDialect ¶ added in v1.9.3

func ValidateWithDialect(sql string, dialect keywords.SQLDialect) error

ValidateWithDialect checks whether the given SQL string is syntactically valid using the specified SQL dialect for keyword recognition.

Types ¶

type ConversionResult ¶ added in v1.4.0

type ConversionResult struct {
	Tokens          []models.TokenWithSpan
	PositionMapping []TokenPosition // Deprecated: always nil — positions are now embedded in TokenWithSpan.Start/End fields
}

ConversionResult holds a preprocessed token stream and optional source-position mappings. Callers typically obtain this via ParseFromModelTokensWithPositions and pass it to ParseWithPositions.

After the token-type unification (#322) the Tokens field holds []models.TokenWithSpan directly; span information is no longer stripped.

type ParseError ¶ added in v1.9.3

type ParseError struct {
	Msg       string
	TokenIdx  int
	Line      int
	Column    int
	TokenType string
	Literal   string
	Cause     error // original error, accessible via Unwrap()
}

ParseError represents a parse error with position information. It preserves the original error via Cause for use with errors.Is/As.

func (*ParseError) Error ¶ added in v1.9.3

func (e *ParseError) Error() string

Error implements the error interface and returns a human-readable description of the parse error with position information when available.

func (*ParseError) Unwrap ¶ added in v1.9.3

func (e *ParseError) Unwrap() error

Unwrap returns the underlying cause, enabling errors.Is and errors.As.

type Parser ¶

type Parser struct {
	// contains filtered or unexported fields
}

Parser is a recursive-descent SQL parser that converts a token stream into an Abstract Syntax Tree (AST).

Parser instances are not thread-safe. Each goroutine must use its own instance, obtained from the pool via GetParser and returned with PutParser:

p := parser.GetParser()
defer parser.PutParser(p)
tree, err := p.ParseFromModelTokens(tokens)

For dialect-aware parsing or strict mode, use NewParser with options, or call ApplyOptions on a pooled instance before parsing.

func GetParser ¶ added in v1.6.0

func GetParser() *Parser

GetParser returns a Parser instance from the pool. The caller MUST call PutParser when done to return it to the pool.

This function is thread-safe and designed for concurrent use. Each goroutine should get its own parser instance from the pool.

Performance: O(1) amortized, <50ns typical latency

Usage:

parser := parser.GetParser()
defer parser.PutParser(parser)  // MANDATORY - prevents resource leaks
ast, err := parser.Parse(tokens)

Thread Safety: Safe for concurrent calls - each goroutine gets its own instance.

func NewParser ¶

func NewParser(opts ...ParserOption) *Parser

NewParser creates a new parser with optional configuration.

func (*Parser) ApplyOptions ¶ added in v1.9.3

func (p *Parser) ApplyOptions(opts ...ParserOption)

ApplyOptions applies parser options to configure behavior.

func (*Parser) Dialect ¶ added in v1.9.3

func (p *Parser) Dialect() string

Dialect returns the SQL dialect configured for this parser. Returns "postgresql" if no dialect was explicitly set.

func (*Parser) Parse ¶

func (p *Parser) Parse(tokens []token.Token) (*ast.AST, error)

Parse parses a slice of token.Token into an AST.

This API is preserved for backward compatibility. Prefer ParseFromModelTokens which accepts []models.TokenWithSpan directly and preserves span information.

Internally the tokens are wrapped into models.TokenWithSpan (with empty spans) and the preprocessing step is applied before parsing.

Thread Safety: NOT thread-safe - use separate parser instances per goroutine.

func (*Parser) ParseContext ¶ added in v1.5.0

func (p *Parser) ParseContext(ctx context.Context, tokens []token.Token) (*ast.AST, error)

ParseContext parses tokens into an AST with context support for cancellation and timeouts.

This method enables graceful cancellation of long-running parsing operations by checking the context at strategic points (statement boundaries and expression starts). The parser checks context.Err() approximately every 10-20 operations, balancing responsiveness with overhead.

Parameters:

ctx: Context for cancellation and timeout control
tokens: Slice of parser tokens to parse

Returns:

*ast.AST: Parsed Abstract Syntax Tree if successful
error: Parsing error, context.Canceled, or context.DeadlineExceeded

Context Checking Strategy:

Checked before each statement parsing
Checked at the start of parseExpression (recursive)
Overhead: ~2% vs non-context parsing
Cancellation latency: <100μs typical

Use Cases:

Long-running parsing operations that need to be cancellable
Implementing timeouts for parsing (prevent hanging on malicious input)
Graceful shutdown scenarios in server applications
User-initiated cancellation in interactive tools

Error Handling:

Returns context.Canceled when ctx.Done() is closed
Returns context.DeadlineExceeded when timeout expires
Cleans up partial AST on cancellation (no memory leaks)

Usage with Timeout:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

parser := parser.GetParser()
defer parser.PutParser(parser)

ast, err := parser.ParseContext(ctx, tokens)
if err != nil {
    if errors.Is(err, context.DeadlineExceeded) {
        log.Println("Parsing timeout exceeded")
    } else if errors.Is(err, context.Canceled) {
        log.Println("Parsing was cancelled")
    } else {
        log.Printf("Parse error: %v", err)
    }
    return
}
defer ast.ReleaseAST(ast)

Usage with Cancellation:

ctx, cancel := context.WithCancel(context.Background())
defer cancel()

// Cancel from another goroutine based on user action
go func() {
    <-userCancelSignal
    cancel()
}()

ast, err := parser.ParseContext(ctx, tokens)
// Check for context.Canceled error

Performance Impact:

Adds ~2% overhead vs Parse() due to context checking
Average: ~354ns for complex queries (vs 347ns for Parse)
Negligible impact on modern CPUs with branch prediction

Thread Safety: NOT thread-safe - use separate parser instances per goroutine. ParseContext parses a slice of token.Token with context support (backward compat shim). For new code prefer ParseContextFromModelTokens.

func (*Parser) ParseContextFromModelTokens ¶ added in v1.9.3

func (p *Parser) ParseContextFromModelTokens(ctx context.Context, tokens []models.TokenWithSpan) (*ast.AST, error)

ParseContextFromModelTokens parses tokenizer output with context support for cancellation.

func (*Parser) ParseFromModelTokens ¶ added in v1.9.3

func (p *Parser) ParseFromModelTokens(tokens []models.TokenWithSpan) (*ast.AST, error)

ParseFromModelTokens parses tokenizer output ([]models.TokenWithSpan) directly into an AST.

This is the preferred entry point for parsing SQL. It accepts the output of the tokenizer directly without any conversion step. Span information is preserved throughout parsing and is available for error reporting.

Issue #322: token_conversion.go has been removed; preprocessing is now a lightweight normalisation step that works entirely with models.TokenWithSpan.

func (*Parser) ParseFromModelTokensWithPositions ¶ added in v1.9.3

func (p *Parser) ParseFromModelTokensWithPositions(tokens []models.TokenWithSpan) (*ast.AST, error)

ParseFromModelTokensWithPositions parses tokenizer output with position tracking for enhanced error reporting. Since models.TokenWithSpan already carries span information, this is now equivalent to ParseFromModelTokens but also populates the ConversionResult position mapping for callers that need it.

func (*Parser) ParseWithPositions ¶ added in v1.6.0

func (p *Parser) ParseWithPositions(result *ConversionResult) (*ast.AST, error)

ParseWithPositions parses tokens with position tracking for enhanced error reporting.

ParseWithPositions parses a ConversionResult into an AST. Since models.TokenWithSpan already embeds span/position information, this is now a thin wrapper around parseTokens — no separate conversion step needed.

Thread Safety: NOT thread-safe - use separate parser instances per goroutine.

func (*Parser) ParseWithRecovery ¶ added in v1.9.3

func (p *Parser) ParseWithRecovery(tokens []token.Token) ([]ast.Statement, []error)

ParseWithRecovery parses a token stream, recovering from errors to collect multiple errors and return a partial AST with successfully parsed statements.

WARNING: This method mutates the parser's internal state (tokens, currentPos) and is NOT safe for concurrent use on the same Parser instance. For thread-safe usage, prefer ParseMultiWithRecovery() which obtains a parser from the pool.

Callers are responsible for returning the parser to the pool via PutParser when done.

Example:

p := parser.GetParser()
defer parser.PutParser(p)
stmts, errs := p.ParseWithRecovery(tokens)

func (*Parser) ParseWithRecoveryFromModelTokens ¶ added in v1.9.3

func (p *Parser) ParseWithRecoveryFromModelTokens(tokens []models.TokenWithSpan) ([]ast.Statement, []error)

ParseWithRecoveryFromModelTokens parses tokenizer output with error recovery.

func (*Parser) Release ¶

func (p *Parser) Release()

Release releases any resources held by the parser

func (*Parser) Reset ¶ added in v1.6.0

func (p *Parser) Reset()

Reset clears the parser state for reuse from the pool.

type ParserOption ¶ added in v1.9.3

type ParserOption func(*Parser)

Parser represents a SQL parser that converts a stream of tokens into an Abstract Syntax Tree (AST).

The parser implements a recursive descent algorithm with one-token lookahead, supporting comprehensive SQL features across multiple database dialects.

Architecture:

Recursive Descent: Top-down parsing with predictive lookahead
Statement Routing: O(1) Type-based dispatch for statement types
Expression Precedence: Handles operator precedence via recursive descent levels
Error Recovery: Provides detailed syntax error messages with position information

Internal State:

tokens: Token stream from the tokenizer (converted to parser tokens)
currentPos: Current position in token stream
currentToken: Current token being examined
depth: Recursion depth counter (DoS protection via MaxRecursionDepth)
ctx: Optional context for cancellation support
positions: Source position mapping for enhanced error reporting

Thread Safety:

NOT thread-safe - each goroutine must use its own parser instance
Use GetParser()/PutParser() to obtain thread-local instances from pool
Parser instances maintain no shared state between calls

Memory Management:

Use GetParser() to obtain from pool
Use defer PutParser() to return to pool (MANDATORY)
Reset() is called automatically by PutParser()

Performance Characteristics:

Throughput: 1.38M+ operations/second sustained
Latency: 347ns average for complex queries
Token Processing: 8M tokens/second
Allocation: <100 bytes/op with object pooling

ParserOption configures optional parser behavior.

func WithDialect ¶ added in v1.9.3

func WithDialect(dialect string) ParserOption

WithDialect sets the SQL dialect for dialect-aware parsing. Supported values: "postgresql", "mysql", "sqlserver", "oracle", "sqlite", etc. If not set, defaults to "postgresql" for backward compatibility.

func WithStrictMode ¶ added in v1.9.3

func WithStrictMode() ParserOption

WithStrictMode enables strict parsing mode. In strict mode, the parser rejects empty statements (e.g., lone semicolons like ";;; SELECT 1 ;;;" will error instead of silently discarding empty statements between semicolons).

By default, the parser operates in lenient mode where empty statements are silently ignored for backward compatibility.

type RecoveryResult ¶ added in v1.9.3

type RecoveryResult struct {
	Statements []ast.Statement
	Errors     []error
	// contains filtered or unexported fields
}

RecoveryResult holds the output of ParseWithRecovery, including parsed statements and any errors encountered during parsing.

Callers MUST call Release() when done to return the parser to the pool.

Example usage:

result := parser.ParseMultiWithRecovery(tokens)
defer result.Release()
for _, stmt := range result.Statements {
    // process statement
}
for _, err := range result.Errors {
    // handle error
}

func ParseMultiWithRecovery ¶ added in v1.9.3

func ParseMultiWithRecovery(tokens []token.Token) *RecoveryResult

ParseMultiWithRecovery obtains a parser from the pool and parses a token stream, recovering from errors to collect multiple errors and return a partial AST.

Unlike Parse(), which stops at the first error, this function uses synchronization tokens (semicolons and statement-starting keywords) to skip past errors and continue parsing subsequent statements.

The caller MUST call Release() on the returned RecoveryResult to return the parser to the pool.

Thread Safety: This function is safe for concurrent use — each call obtains its own parser instance from the pool.

Example:

result := parser.ParseMultiWithRecovery(tokens)
defer result.Release()
fmt.Printf("parsed %d statements with %d errors\n", len(result.Statements), len(result.Errors))

func (*RecoveryResult) Release ¶ added in v1.9.3

func (r *RecoveryResult) Release()

Release returns the underlying parser to the pool. Must be called when the caller is done with the result. Safe to call multiple times.

type TokenPosition ¶ added in v1.4.0

type TokenPosition struct {
	OriginalIndex int
	Start         models.Location
	End           models.Location
	SourceToken   *models.TokenWithSpan
}

TokenPosition maps a parser token back to its original source position.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

SQL Parser Package

Overview

Key Features

Usage

Basic Parsing

Context-Aware Parsing

Architecture

Core Components

Parsing Flow

Recursion Protection

Supported SQL Features

Phase 1 (v1.0.0) - Core DML

Phase 2 (v1.2.0) - Advanced Features

Phase 2.5 (v1.3.0) - Window Functions

Phase 2.6 (v1.5.0) - NULL Ordering

Performance Characteristics

Error Handling

Testing

Best Practices

1. Always Use Defer

2. Don't Store Pooled Instances

3. Use Context for Long Operations

Common Pitfalls

❌ Forgetting to Release

✅ Correct Pattern

Related Packages

Documentation

Version History

Documentation ¶

Overview ¶

Overview ¶

Architecture ¶

Parsing Flow ¶

Token Management ¶

Performance Optimizations ¶

DoS Protection ¶

Error Handling ¶

SQL Feature Support (v1.6.0) ¶

Core DML Operations ¶

DDL Operations ¶

Advanced SELECT Features ¶

PostgreSQL Extensions (v1.6.0) ¶

Expression Support ¶

Window Functions (SQL-99) ¶

Context and Cancellation ¶

Thread Safety ¶

Memory Management ¶

Usage Examples ¶

Performance Characteristics ¶

SQL Compliance ¶

Empty Statement Handling ¶

Limitations ¶

Related Packages ¶

Further Reading ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func ParseBytes ¶ added in v1.9.3

func ParseBytesWithDialect ¶ added in v1.9.3

func ParseBytesWithTokens ¶ added in v1.9.3

func ParseWithDialect ¶ added in v1.9.3

func PutParser ¶ added in v1.6.0

func Validate ¶ added in v1.9.3

func ValidateBytes ¶ added in v1.9.3

func ValidateBytesWithDialect ¶ added in v1.9.3

func ValidateWithDialect ¶ added in v1.9.3

Types ¶

type ConversionResult ¶ added in v1.4.0

type ParseError ¶ added in v1.9.3

func (*ParseError) Error ¶ added in v1.9.3

func (*ParseError) Unwrap ¶ added in v1.9.3

type Parser ¶

func GetParser ¶ added in v1.6.0

func NewParser ¶

func (*Parser) ApplyOptions ¶ added in v1.9.3

func (*Parser) Dialect ¶ added in v1.9.3

func (*Parser) Parse ¶

func (*Parser) ParseContext ¶ added in v1.5.0