parser

package
v1.7.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 12, 2026 License: AGPL-3.0 Imports: 8 Imported by: 0

README

SQL Parser Package

Overview

The parser package provides a production-ready, recursive descent SQL parser that converts tokenized SQL into an Abstract Syntax Tree (AST). It supports comprehensive SQL features across multiple dialects with ~80-85% SQL-99 compliance.

Key Features

  • DML Operations: SELECT, INSERT, UPDATE, DELETE with full clause support
  • DDL Operations: CREATE TABLE, ALTER TABLE, DROP TABLE, CREATE INDEX
  • Advanced SQL: CTEs (WITH), set operations (UNION/EXCEPT/INTERSECT), window functions
  • JOINs: All types (INNER, LEFT, RIGHT, FULL, CROSS, NATURAL) with proper left-associative parsing
  • Window Functions: PARTITION BY, ORDER BY, frame clauses (ROWS/RANGE)
  • SQL-99 F851: NULLS FIRST/LAST support in ORDER BY clauses
  • Object Pooling: Memory-efficient parser instance reuse
  • Context Support: Cancellation and timeout handling

Usage

Basic Parsing
package main

import (
    "github.com/ajitpratap0/GoSQLX/pkg/sql/parser"
    "github.com/ajitpratap0/GoSQLX/pkg/sql/token"
)

func main() {
    // Create parser from pool
    p := parser.NewParser()
    defer p.Release()  // ALWAYS release back to pool

    // Parse tokens into AST
    tokens := []token.Token{ /* your tokens */ }
    astNode, err := p.Parse(tokens)
    if err != nil {
        // Handle parsing error
    }

    // Work with AST
    // ...
}
Context-Aware Parsing
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

p := parser.NewParser()
defer p.Release()

astNode, err := p.ParseContext(ctx, tokens)
if err != nil {
    if ctx.Err() != nil {
        // Handle timeout/cancellation
    }
    // Handle parse error
}

Architecture

Core Components
  • parser.go (1,628 lines): Main parser with all parsing logic
  • alter.go (368 lines): DDL ALTER statement parsing
  • token_converter.go (~200 lines): Token type conversion utilities
Parsing Flow
Tokens → Parse() → parseStatement() → Specific statement parser → AST Node
Recursion Protection

Maximum recursion depth: 100 levels

Protects against:

  • Deeply nested CTEs
  • Excessive subquery nesting
  • Stack overflow attacks

Supported SQL Features

Phase 1 (v1.0.0) - Core DML
  • SELECT with FROM, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, OFFSET
  • All JOIN types with proper precedence
  • INSERT (single/multi-row)
  • UPDATE with SET and WHERE
  • DELETE with WHERE
Phase 2 (v1.2.0) - Advanced Features
  • Common Table Expressions (WITH clause)
  • Recursive CTEs with depth protection
  • Set operations: UNION [ALL], EXCEPT, INTERSECT
  • CTE column specifications
Phase 2.5 (v1.3.0) - Window Functions
  • Ranking: ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE()
  • Analytic: LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()
  • PARTITION BY and ORDER BY
  • Frame clauses: ROWS/RANGE with bounds
Phase 2.6 (v1.5.0) - NULL Ordering
  • NULLS FIRST/LAST in ORDER BY
  • NULLS FIRST/LAST in window ORDER BY
  • Database portability for NULL ordering

Performance Characteristics

  • Throughput: 1.5M operations/second (peak), 1.38M sustained
  • Memory: Object pooling provides 60-80% reduction vs. new instances
  • Latency: <1μs for complex queries with window functions
  • Thread Safety: All pool operations are race-free

Error Handling

astNode, err := p.Parse(tokens)
if err != nil {
    if parseErr, ok := err.(*parser.ParseError); ok {
        fmt.Printf("Parse error at token '%s': %s\n",
            parseErr.Token.Literal, parseErr.Message)
    }
}

Testing

Run parser tests:

# All tests
go test -v ./pkg/sql/parser/

# With race detection
go test -race ./pkg/sql/parser/

# Specific features
go test -v -run TestParser_.*Window ./pkg/sql/parser/
go test -v -run TestParser_.*CTE ./pkg/sql/parser/
go test -v -run TestParser_.*Join ./pkg/sql/parser/

# Performance benchmarks
go test -bench=BenchmarkParser -benchmem ./pkg/sql/parser/

Best Practices

1. Always Use Defer
p := parser.NewParser()
defer p.Release()  // Ensures cleanup even on panic
2. Don't Store Pooled Instances
// BAD: Storing pooled object
type MyStruct struct {
    parser *Parser  // DON'T DO THIS
}

// GOOD: Get from pool when needed
func ParseSQL(tokens []token.Token) (*ast.AST, error) {
    p := parser.NewParser()
    defer p.Release()
    return p.Parse(tokens)
}
3. Use Context for Long Operations
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()

p := parser.NewParser()
defer p.Release()

astNode, err := p.ParseContext(ctx, tokens)

Common Pitfalls

❌ Forgetting to Release
// BAD: Memory leak
p := parser.NewParser()
astNode, _ := p.Parse(tokens)
// p never returned to pool
✅ Correct Pattern
// GOOD: Automatic cleanup
p := parser.NewParser()
defer p.Release()
astNode, err := p.Parse(tokens)
  • tokenizer: Converts SQL text to tokens (input to parser)
  • ast: AST node definitions (output from parser)
  • token: Token type definitions
  • keywords: SQL keyword classification

Documentation

Version History

  • v1.5.0: NULLS FIRST/LAST support (SQL-99 F851)
  • v1.4.0: Production validation complete
  • v1.3.0: Window functions (Phase 2.5)
  • v1.2.0: CTEs and set operations (Phase 2)
  • v1.0.0: Core DML and JOINs (Phase 1)

Documentation

Overview

Package parser provides a high-performance, production-ready recursive descent SQL parser that converts tokenized SQL into a comprehensive Abstract Syntax Tree (AST).

Overview

The parser implements a predictive recursive descent parser with one-token lookahead, supporting comprehensive SQL features across multiple database dialects including PostgreSQL, MySQL, SQL Server, Oracle, and SQLite. It achieves enterprise-grade performance with 1.38M+ operations/second sustained throughput and 347ns average latency for complex queries.

Architecture

The parser follows a modular architecture with specialized parsing functions for each SQL construct:

  • parser.go: Main parser entry point, statement routing, and core token management
  • select.go: SELECT statement parsing including DISTINCT ON, FETCH, and table operations
  • dml.go: Data Manipulation Language (INSERT, UPDATE, DELETE, MERGE statements)
  • ddl.go: Data Definition Language (CREATE, ALTER, DROP, TRUNCATE statements)
  • expressions.go: Expression parsing with operator precedence and JSON operators
  • window.go: Window function parsing (OVER clause, PARTITION BY, ORDER BY, frame specs)
  • cte.go: Common Table Expression parsing with recursive CTE support
  • grouping.go: GROUPING SETS, ROLLUP, CUBE parsing (SQL-99 T431)
  • alter.go: ALTER TABLE statement parsing

Parsing Flow

The typical parsing flow involves three stages:

  1. Token Conversion: Convert tokenizer output to parser tokens tokens := tokenizer.Tokenize(sqlBytes) result := parser.ConvertTokensForParser(tokens)

  2. AST Generation: Parse tokens into Abstract Syntax Tree parser := parser.GetParser() defer parser.PutParser(parser) ast, err := parser.ParseWithPositions(result)

  3. AST Processing: Traverse and analyze the generated AST visitor.Walk(ast, myVisitor)

Token Management

The parser uses ModelType-based token matching for optimal performance. ModelType is an integer enumeration that enables O(1) switch-based dispatch instead of O(n) string comparisons. This optimization provides ~14x performance improvement on hot paths (0.24ns vs 3.4ns per comparison).

Fast path example:

if p.currentToken.ModelType == models.TokenTypeSelect {
    // O(1) integer comparison
    return p.parseSelectWithSetOperations()
}

The parser maintains backward compatibility with string-based token matching for tests and legacy code that creates tokens without ModelType.

Performance Optimizations

The parser implements several performance optimizations:

  • Object Pooling: All major data structures use sync.Pool for zero-allocation reuse
  • Fast Token Dispatch: O(1) ModelType switch instead of O(n) string comparisons
  • Pre-allocation: Statement slices pre-allocated based on input size estimation
  • Zero-copy Operations: Direct token access without string allocation
  • Recursion Depth Limiting: MaxRecursionDepth prevents stack overflow (DoS protection)

DoS Protection

The parser includes protection against denial-of-service attacks via deeply nested expressions:

const MaxRecursionDepth = 100  // Prevents stack overflow

Expressions deeper than this limit return a RecursionDepthLimitError, preventing both stack exhaustion and excessive parsing time on malicious input.

Error Handling

The parser provides structured error handling with precise position information:

  • Syntax errors include line/column location from the tokenizer
  • Error messages preserve SQL context for debugging
  • Errors use the pkg/errors package with error codes for categorization
  • ParseWithPositions() enables enhanced error reporting with source positions

Example error:

error: expected 'FROM' but got 'WHERE' at line 1, column 15

SQL Feature Support (v1.6.0)

Core DML Operations

  • SELECT: Full SELECT support with DISTINCT, DISTINCT ON, aliases, subqueries
  • INSERT: INSERT INTO with VALUES, column lists, RETURNING clause
  • UPDATE: UPDATE with SET clauses, WHERE conditions, RETURNING clause
  • DELETE: DELETE FROM with WHERE conditions, RETURNING clause
  • MERGE: SQL:2003 MERGE statements with MATCHED/NOT MATCHED clauses

DDL Operations

  • CREATE TABLE: Tables with constraints, partitioning, column definitions
  • CREATE VIEW: Views with OR REPLACE, TEMPORARY, IF NOT EXISTS
  • CREATE MATERIALIZED VIEW: Materialized views with WITH [NO] DATA
  • CREATE INDEX: Indexes with UNIQUE, USING, partial indexes (WHERE clause)
  • ALTER TABLE: ADD/DROP COLUMN, ADD/DROP CONSTRAINT, RENAME operations
  • DROP: Drop tables, views, materialized views, indexes with CASCADE/RESTRICT
  • TRUNCATE: TRUNCATE TABLE with RESTART/CONTINUE IDENTITY, CASCADE/RESTRICT
  • REFRESH MATERIALIZED VIEW: With CONCURRENTLY and WITH [NO] DATA options

Advanced SELECT Features

  • JOINs: INNER, LEFT, RIGHT, FULL, CROSS, NATURAL joins with ON/USING
  • LATERAL JOIN: PostgreSQL correlated subqueries in FROM clause
  • Subqueries: Scalar, EXISTS, IN, ANY, ALL subqueries
  • CTEs: WITH clause, recursive CTEs, multiple CTE definitions
  • Set Operations: UNION, UNION ALL, EXCEPT, INTERSECT with proper associativity
  • DISTINCT ON: PostgreSQL-specific row selection by expression
  • Window Functions: OVER clause with PARTITION BY, ORDER BY, frame specs
  • GROUPING SETS: GROUPING SETS, ROLLUP, CUBE (SQL-99 T431)
  • ORDER BY: With NULLS FIRST/LAST (SQL-99 F851)
  • LIMIT/OFFSET: Standard pagination with ROW/ROWS variants
  • FETCH FIRST/NEXT: SQL-99 FETCH clause with PERCENT, ONLY, WITH TIES

PostgreSQL Extensions (v1.6.0)

  • LATERAL JOIN: Correlated lateral subqueries in FROM/JOIN clauses
  • JSON/JSONB Operators: All 10 operators (->/->>/#>/#>>/@>/<@/?/?|/?&/#-)
  • DISTINCT ON: Row deduplication by expression with ORDER BY
  • FILTER Clause: Conditional aggregation (SQL:2003 T612)
  • RETURNING Clause: Return modified rows from INSERT/UPDATE/DELETE
  • Aggregate ORDER BY: ORDER BY inside STRING_AGG, ARRAY_AGG functions
  • Materialized CTE Hints: AS [NOT] MATERIALIZED in CTE definitions

Expression Support

The parser handles comprehensive expression types with correct operator precedence:

  • Logical: AND, OR, NOT with proper precedence (OR < AND < comparison)
  • Comparison: =, <, >, !=, <=, >=, <> with type-safe evaluation
  • Arithmetic: +, -, *, /, % with standard precedence (* > +)
  • String: || (concatenation) with proper precedence
  • JSON: ->, ->>, #>, #>>, @>, <@, ?, ?|, ?&, #- (PostgreSQL)
  • Pattern Matching: LIKE, ILIKE, NOT LIKE with escape sequences
  • Range: BETWEEN, NOT BETWEEN with inclusive bounds
  • Set Membership: IN, NOT IN with value lists or subqueries
  • NULL Testing: IS NULL, IS NOT NULL with three-valued logic
  • Quantifiers: ANY, ALL with comparison operators
  • Existence: EXISTS, NOT EXISTS with subquery evaluation
  • CASE: Both simple and searched CASE expressions
  • CAST: Type conversion with CAST(expr AS type)
  • Function Calls: Regular functions and aggregate functions

Window Functions (SQL-99)

Complete support for SQL-99 window functions with OVER clause:

  • Ranking: ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(n)
  • Offset: LAG(expr, offset, default), LEAD(expr, offset, default)
  • Value: FIRST_VALUE(expr), LAST_VALUE(expr), NTH_VALUE(expr, n)
  • PARTITION BY: Partition data into groups for window computation
  • ORDER BY: Order rows within each partition
  • Frame Clause: ROWS/RANGE with PRECEDING/FOLLOWING/CURRENT ROW
  • Frame Bounds: UNBOUNDED PRECEDING, n PRECEDING, CURRENT ROW, n FOLLOWING, UNBOUNDED FOLLOWING

Example window function query:

SELECT
    dept,
    name,
    salary,
    ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) as rank,
    LAG(salary, 1) OVER (ORDER BY hire_date) as prev_salary,
    SUM(salary) OVER (ORDER BY hire_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as rolling_sum
FROM employees;

Context and Cancellation

The parser supports context-based cancellation for long-running operations:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
ast, err := parser.ParseContext(ctx, tokens)
if err == context.DeadlineExceeded {
    // Handle timeout
}

The parser checks context.Err() at strategic points (statement boundaries, expression starts) to enable fast cancellation without excessive overhead.

Thread Safety

The parser is designed for concurrent use with proper object pooling:

  • GetParser()/PutParser(): Thread-safe parser pooling via sync.Pool
  • Zero race conditions: Validated via comprehensive race detection tests
  • Per-goroutine instances: Each goroutine gets its own parser from pool
  • No shared state: Parser instances maintain no shared mutable state

Memory Management

Critical: Always use defer with pool return functions to prevent resource leaks:

parser := parser.GetParser()
defer parser.PutParser(parser)  // MANDATORY - prevents memory leaks

The parser integrates with the AST object pool:

astObj := ast.NewAST()
defer ast.ReleaseAST(astObj)  // MANDATORY - returns to pool

Object pooling provides 60-80% memory reduction in production workloads with 95%+ pool hit rates.

Usage Examples

Basic parsing with position tracking:

import (
    "github.com/ajitpratap0/GoSQLX/pkg/sql/parser"
    "github.com/ajitpratap0/GoSQLX/pkg/sql/tokenizer"
)

// Tokenize SQL
tkz := tokenizer.GetTokenizer()
defer tokenizer.PutTokenizer(tkz)
tokens, err := tkz.Tokenize([]byte("SELECT * FROM users WHERE active = true"))
if err != nil {
    // Handle tokenization error
}

// Convert tokens
result := parser.ConvertTokensForParser(tokens)

// Parse to AST
p := parser.GetParser()
defer parser.PutParser(p)
astObj, err := p.ParseWithPositions(result)
defer ast.ReleaseAST(astObj)
if err != nil {
    // Handle parsing error with line/column information
}

// Access parsed statements
for _, stmt := range astObj.Statements {
    // Process each statement
}

Parsing with timeout:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

p := parser.GetParser()
defer parser.PutParser(p)

astObj, err := p.ParseContext(ctx, tokens)
defer ast.ReleaseAST(astObj)
if err != nil {
    if errors.Is(err, context.DeadlineExceeded) {
        log.Println("Parsing timeout exceeded")
    }
    // Handle other errors
}

Performance Characteristics

Measured performance on production workloads (v1.6.0):

  • Throughput: 1.38M+ operations/second sustained, 1.5M peak
  • Latency: 347ns average for complex queries with window functions
  • Token Processing: 8M tokens/second
  • Memory Efficiency: 60-80% reduction via object pooling
  • Allocation Rate: <100 bytes/op for pooled parsing
  • Cache Efficiency: 95%+ pool hit rate in production

SQL Compliance

The parser provides approximately 80-85% SQL-99 compliance:

  • Core SQL-99: Full support for basic SELECT, INSERT, UPDATE, DELETE
  • SQL-99 Features: Window functions (F611), CTEs (T121), set operations
  • SQL:2003 Features: MERGE statements (F312), XML/JSON operators
  • SQL:2008 Features: TRUNCATE TABLE, enhanced grouping operations
  • Vendor Extensions: PostgreSQL, MySQL, SQL Server, Oracle specific syntax

Limitations

Current limitations (will be addressed in future releases):

  • Stored procedures: CREATE PROCEDURE/FUNCTION not yet supported
  • Triggers: CREATE TRIGGER parsing not implemented
  • Some vendor-specific extensions may require additional work
  • github.com/ajitpratap0/GoSQLX/pkg/sql/tokenizer: Token generation from SQL text
  • github.com/ajitpratap0/GoSQLX/pkg/sql/ast: AST node definitions and visitor pattern
  • github.com/ajitpratap0/GoSQLX/pkg/models: Token types, spans, locations
  • github.com/ajitpratap0/GoSQLX/pkg/errors: Structured error types with codes
  • github.com/ajitpratap0/GoSQLX/pkg/sql/keywords: Multi-dialect keyword classification

Further Reading

  • docs/USAGE_GUIDE.md: Comprehensive usage guide with examples
  • docs/SQL_COMPATIBILITY.md: SQL dialect compatibility matrix
  • CHANGELOG.md: Version history and feature additions

Package parser provides a high-performance recursive descent SQL parser that converts tokenized SQL into a comprehensive Abstract Syntax Tree (AST).

The parser supports enterprise-grade SQL parsing with 1.38M+ ops/sec throughput, comprehensive multi-dialect support (PostgreSQL, MySQL, SQL Server, Oracle, SQLite), and production-ready features including DoS protection, context cancellation, and object pooling for optimal memory efficiency.

Quick Start

// Get parser from pool
parser := parser.GetParser()
defer parser.PutParser(parser)

// Parse tokens to AST
result := parser.ConvertTokensForParser(tokens)
astObj, err := parser.ParseWithPositions(result)
defer ast.ReleaseAST(astObj)

v1.6.0 PostgreSQL Extensions

  • LATERAL JOIN: Correlated subqueries in FROM clause
  • JSON/JSONB Operators: All 10 operators (->/->>/#>/#>>/@>/<@/?/?|/?&/#-)
  • DISTINCT ON: PostgreSQL-specific row deduplication
  • FILTER Clause: Conditional aggregation (SQL:2003 T612)
  • RETURNING Clause: Return modified rows from DML statements
  • Aggregate ORDER BY: ORDER BY inside STRING_AGG, ARRAY_AGG

v1.5.0 Features (SQL-99 Compliance)

  • GROUPING SETS, ROLLUP, CUBE: Advanced grouping (SQL-99 T431)
  • MERGE Statements: SQL:2003 MERGE with MATCHED/NOT MATCHED
  • Materialized Views: CREATE/REFRESH/DROP with CONCURRENTLY
  • FETCH Clause: SQL-99 F861/F862 with PERCENT, ONLY, WITH TIES
  • TRUNCATE: Enhanced with RESTART/CONTINUE IDENTITY

v1.3.0 Window Functions (Phase 2.5)

  • Window Functions: OVER clause with PARTITION BY, ORDER BY
  • Ranking: ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE()
  • Analytic: LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()
  • Frame Clauses: ROWS/RANGE with PRECEDING/FOLLOWING/CURRENT ROW

v1.2.0 CTEs and Set Operations (Phase 2)

  • Common Table Expressions: WITH clause with recursive support
  • Set Operations: UNION, UNION ALL, EXCEPT, INTERSECT
  • Multiple CTEs: Comma-separated CTE definitions in single query
  • CTE Column Lists: Optional column specifications

For comprehensive documentation, see doc.go in this package.

Index

Constants

View Source
const MaxRecursionDepth = 100

MaxRecursionDepth defines the maximum allowed recursion depth for parsing operations. This prevents stack overflow from deeply nested expressions, CTEs, or other recursive structures.

DoS Protection: This limit protects against denial-of-service attacks via malicious SQL with deeply nested expressions like: (((((...((value))...)))))

Typical Values:

  • MaxRecursionDepth = 100: Protects against stack exhaustion
  • Legitimate queries rarely exceed depth of 10-15
  • Malicious queries can reach thousands without this limit

Error: Exceeding this depth returns goerrors.RecursionDepthLimitError

Variables

This section is empty.

Functions

func ConvertTokensForParser added in v1.4.0

func ConvertTokensForParser(tokens []models.TokenWithSpan) ([]token.Token, error)

ConvertTokensForParser converts tokenizer output to parser input tokens.

This is a convenience function that creates a TokenConverter and performs the conversion in a single call. It returns only the converted tokens without position mappings, making it suitable for use cases where enhanced error reporting is not required.

For position-aware parsing with enhanced error reporting, use ConvertTokensWithPositions() instead.

Parameters:

  • tokens: Slice of tokenizer output (models.TokenWithSpan)

Returns:

  • []token.Token: Converted parser tokens
  • error: Conversion error if token is invalid

Performance:

  • Throughput: ~10M tokens/second
  • Overhead: ~80ns per token
  • Memory: Allocates new slice for tokens

Usage:

// Tokenize SQL
tkz := tokenizer.GetTokenizer()
defer tokenizer.PutTokenizer(tkz)
tokens, err := tkz.Tokenize([]byte("SELECT * FROM users"))
if err != nil {
    log.Fatal(err)
}

// Convert for parser (basic mode)
parserTokens, err := parser.ConvertTokensForParser(tokens)
if err != nil {
    log.Fatal(err)
}

// Parse
p := parser.GetParser()
defer parser.PutParser(p)
ast, err := p.Parse(parserTokens)
defer ast.ReleaseAST(ast)

Backward Compatibility: Maintains compatibility with existing CLI code.

Thread Safety: Safe for concurrent calls - creates new converter instance.

func PutParser added in v1.6.0

func PutParser(p *Parser)

PutParser returns a Parser instance to the pool after resetting it. This MUST be called after parsing is complete to enable reuse and prevent memory leaks.

The parser is automatically reset before being returned to the pool, clearing all internal state (tokens, position, depth, context, position mappings).

Performance: O(1), <30ns typical latency

Usage:

parser := parser.GetParser()
defer parser.PutParser(parser)  // Use defer to ensure cleanup on error paths

Thread Safety: Safe for concurrent calls - operates on independent parser instances.

Types

type ConversionResult added in v1.4.0

type ConversionResult struct {
	Tokens          []token.Token
	PositionMapping []TokenPosition // Maps parser token index to original position
}

ConversionResult contains the converted tokens and their position mappings for error reporting.

Position mappings enable the parser to report errors with accurate line and column numbers from the original SQL source. Each parser token is mapped back to its corresponding tokenizer token with full position information.

Usage:

result := parser.ConvertTokensForParser(tokenizerOutput)
ast, err := parser.ParseWithPositions(result)
if err != nil {
    // Error includes line/column from original source
    log.Printf("Parse error at line %d, column %d: %v",
        err.Location.Line, err.Location.Column, err)
}

func ConvertTokensWithPositions added in v1.4.0

func ConvertTokensWithPositions(tokens []models.TokenWithSpan) (*ConversionResult, error)

ConvertTokensWithPositions converts tokenizer output to parser input with position tracking.

This function provides both converted tokens and position mappings for enhanced error reporting. It is the recommended conversion method for production use where detailed error messages with line and column information are important.

The returned ConversionResult can be passed directly to ParseWithPositions() for position-aware parsing.

Parameters:

  • tokens: Slice of tokenizer output (models.TokenWithSpan)

Returns:

  • *ConversionResult: Converted tokens with position mappings
  • error: Conversion error if token is invalid

Performance:

  • Throughput: ~10M tokens/second
  • Overhead: ~80ns per token (same as ConvertTokensForParser)
  • Memory: Allocates slices for tokens and position mappings

Usage (Recommended for Production):

// Tokenize SQL
tkz := tokenizer.GetTokenizer()
defer tokenizer.PutTokenizer(tkz)
tokens, err := tkz.Tokenize([]byte("SELECT * FROM users WHERE id = $1"))
if err != nil {
    log.Fatal(err)
}

// Convert with position tracking
result, err := parser.ConvertTokensWithPositions(tokens)
if err != nil {
    log.Fatal(err)
}

// Parse with position information
p := parser.GetParser()
defer parser.PutParser(p)
ast, err := p.ParseWithPositions(result)
if err != nil {
    // Error includes line/column information
    log.Printf("Parse error at line %d, column %d: %v",
        err.Location.Line, err.Location.Column, err)
    return
}
defer ast.ReleaseAST(ast)

Position Mapping:

  • Each parser token is mapped back to its tokenizer token
  • Compound tokens (e.g., "GROUPING SETS") map all parts to original position
  • Position information includes line, column, and byte offset

Thread Safety: Safe for concurrent calls - creates new converter instance.

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

Parser represents a SQL parser that converts a stream of tokens into an Abstract Syntax Tree (AST).

The parser implements a recursive descent algorithm with one-token lookahead, supporting comprehensive SQL features across multiple database dialects.

Architecture:

  • Recursive Descent: Top-down parsing with predictive lookahead
  • Statement Routing: O(1) ModelType-based dispatch for statement types
  • Expression Precedence: Handles operator precedence via recursive descent levels
  • Error Recovery: Provides detailed syntax error messages with position information

Internal State:

  • tokens: Token stream from the tokenizer (converted to parser tokens)
  • currentPos: Current position in token stream
  • currentToken: Current token being examined
  • depth: Recursion depth counter (DoS protection via MaxRecursionDepth)
  • ctx: Optional context for cancellation support
  • positions: Source position mapping for enhanced error reporting

Thread Safety:

  • NOT thread-safe - each goroutine must use its own parser instance
  • Use GetParser()/PutParser() to obtain thread-local instances from pool
  • Parser instances maintain no shared state between calls

Memory Management:

  • Use GetParser() to obtain from pool
  • Use defer PutParser() to return to pool (MANDATORY)
  • Reset() is called automatically by PutParser()

Performance Characteristics:

  • Throughput: 1.38M+ operations/second sustained
  • Latency: 347ns average for complex queries
  • Token Processing: 8M tokens/second
  • Allocation: <100 bytes/op with object pooling

func GetParser added in v1.6.0

func GetParser() *Parser

GetParser returns a Parser instance from the pool. The caller MUST call PutParser when done to return it to the pool.

This function is thread-safe and designed for concurrent use. Each goroutine should get its own parser instance from the pool.

Performance: O(1) amortized, <50ns typical latency

Usage:

parser := parser.GetParser()
defer parser.PutParser(parser)  // MANDATORY - prevents resource leaks
ast, err := parser.Parse(tokens)

Thread Safety: Safe for concurrent calls - each goroutine gets its own instance.

func NewParser

func NewParser() *Parser

NewParser creates a new parser

func (*Parser) Parse

func (p *Parser) Parse(tokens []token.Token) (*ast.AST, error)

Parse parses a token stream into an Abstract Syntax Tree (AST).

This is the primary parsing method that converts tokens from the tokenizer into a structured AST representing the SQL statements. It uses fast O(1) ModelType-based dispatch for optimal performance on hot paths.

Parameters:

  • tokens: Slice of parser tokens (use ConvertTokensForParser to convert from tokenizer output)

Returns:

  • *ast.AST: Parsed Abstract Syntax Tree containing one or more statements
  • error: Syntax error with basic error information (no position tracking)

Performance:

  • Average: 347ns for complex queries with window functions
  • Throughput: 1.38M+ operations/second sustained
  • Memory: <100 bytes/op with object pooling

Error Handling:

  • Returns syntax errors without position information
  • Use ParseWithPositions() for enhanced error reporting with line/column
  • Cleans up AST on error (no memory leaks)

Usage:

parser := parser.GetParser()
defer parser.PutParser(parser)

// Convert tokenizer output to parser tokens
tokens := parser.ConvertTokensForParser(tokenizerOutput)

// Parse tokens
ast, err := parser.Parse(tokens.Tokens)
if err != nil {
    log.Printf("Parse error: %v", err)
    return
}
defer ast.ReleaseAST(ast)

For position-aware error reporting, use ParseWithPositions() instead.

Thread Safety: NOT thread-safe - use separate parser instances per goroutine.

func (*Parser) ParseContext added in v1.5.0

func (p *Parser) ParseContext(ctx context.Context, tokens []token.Token) (*ast.AST, error)

ParseContext parses tokens into an AST with context support for cancellation and timeouts.

This method enables graceful cancellation of long-running parsing operations by checking the context at strategic points (statement boundaries and expression starts). The parser checks context.Err() approximately every 10-20 operations, balancing responsiveness with overhead.

Parameters:

  • ctx: Context for cancellation and timeout control
  • tokens: Slice of parser tokens to parse

Returns:

  • *ast.AST: Parsed Abstract Syntax Tree if successful
  • error: Parsing error, context.Canceled, or context.DeadlineExceeded

Context Checking Strategy:

  • Checked before each statement parsing
  • Checked at the start of parseExpression (recursive)
  • Overhead: ~2% vs non-context parsing
  • Cancellation latency: <100μs typical

Use Cases:

  • Long-running parsing operations that need to be cancellable
  • Implementing timeouts for parsing (prevent hanging on malicious input)
  • Graceful shutdown scenarios in server applications
  • User-initiated cancellation in interactive tools

Error Handling:

  • Returns context.Canceled when ctx.Done() is closed
  • Returns context.DeadlineExceeded when timeout expires
  • Cleans up partial AST on cancellation (no memory leaks)

Usage with Timeout:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

parser := parser.GetParser()
defer parser.PutParser(parser)

ast, err := parser.ParseContext(ctx, tokens)
if err != nil {
    if errors.Is(err, context.DeadlineExceeded) {
        log.Println("Parsing timeout exceeded")
    } else if errors.Is(err, context.Canceled) {
        log.Println("Parsing was cancelled")
    } else {
        log.Printf("Parse error: %v", err)
    }
    return
}
defer ast.ReleaseAST(ast)

Usage with Cancellation:

ctx, cancel := context.WithCancel(context.Background())
defer cancel()

// Cancel from another goroutine based on user action
go func() {
    <-userCancelSignal
    cancel()
}()

ast, err := parser.ParseContext(ctx, tokens)
// Check for context.Canceled error

Performance Impact:

  • Adds ~2% overhead vs Parse() due to context checking
  • Average: ~354ns for complex queries (vs 347ns for Parse)
  • Negligible impact on modern CPUs with branch prediction

Thread Safety: NOT thread-safe - use separate parser instances per goroutine.

func (*Parser) ParseWithPositions added in v1.6.0

func (p *Parser) ParseWithPositions(result *ConversionResult) (*ast.AST, error)

ParseWithPositions parses tokens with position tracking for enhanced error reporting.

This method accepts a ConversionResult from ConvertTokensForParser(), which includes both the converted tokens and their original source positions from the tokenizer. Syntax errors will include accurate line and column information for debugging.

Parameters:

  • result: ConversionResult from ConvertTokensForParser containing tokens and position mapping

Returns:

  • *ast.AST: Parsed Abstract Syntax Tree containing one or more statements
  • error: Syntax error with line/column position information

Performance:

  • Slightly slower than Parse() due to position tracking overhead (~5%)
  • Average: ~365ns for complex queries (vs 347ns for Parse)
  • Recommended for production use where error reporting is important

Error Reporting Enhancement:

  • Includes line and column numbers in error messages
  • Example: "expected 'FROM' but got 'WHERE' at line 1, column 15"
  • Position information extracted from tokenizer output

Usage:

parser := parser.GetParser()
defer parser.PutParser(parser)

// Convert tokenizer output with position tracking
result := parser.ConvertTokensForParser(tokenizerOutput)

// Parse with position information
ast, err := parser.ParseWithPositions(result)
if err != nil {
    // Error includes line/column information
    log.Printf("Parse error at %v: %v", err.Location, err)
    return
}
defer ast.ReleaseAST(ast)

This is the recommended parsing method for production use where detailed error reporting is important for debugging and user feedback.

Thread Safety: NOT thread-safe - use separate parser instances per goroutine.

func (*Parser) Release

func (p *Parser) Release()

Release releases any resources held by the parser

func (*Parser) Reset added in v1.6.0

func (p *Parser) Reset()

Reset clears the parser state for reuse from the pool.

type TokenConverter added in v1.4.0

type TokenConverter struct {
	// contains filtered or unexported fields
}

TokenConverter provides centralized, optimized token conversion from tokenizer output (models.TokenWithSpan) to parser input (token.Token).

The converter performs the following transformations:

  • Converts tokenizer TokenType to parser token.Type
  • Splits compound tokens (e.g., "GROUPING SETS" -> ["GROUPING", "SETS"])
  • Preserves source position information for error reporting
  • Uses object pooling for temporary buffers to reduce allocations

Performance:

  • Throughput: ~10M tokens/second conversion rate
  • Memory: Zero allocations for keyword conversion via sync.Pool
  • Overhead: ~80ns per token (including position tracking)

Thread Safety: NOT thread-safe - create separate instances per goroutine.

func NewTokenConverter added in v1.4.0

func NewTokenConverter() *TokenConverter

NewTokenConverter creates an optimized token converter

func (*TokenConverter) Convert added in v1.4.0

func (tc *TokenConverter) Convert(tokens []models.TokenWithSpan) (*ConversionResult, error)

Convert converts tokenizer tokens to parser tokens with position tracking

type TokenPosition added in v1.4.0

type TokenPosition struct {
	OriginalIndex int                   // Index in original token slice
	Start         models.Location       // Original start position
	End           models.Location       // Original end position
	SourceToken   *models.TokenWithSpan // Reference to original token for error reporting
}

TokenPosition maps a parser token back to its original source position.

This structure enables precise error reporting by maintaining the connection between parser tokens and their original source locations in the SQL text.

Fields:

  • OriginalIndex: Index in the original tokenizer output slice
  • Start: Starting position (line, column, offset) in source SQL
  • End: Ending position (line, column, offset) in source SQL
  • SourceToken: Reference to original tokenizer token for full context

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL