models

package
v1.7.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 12, 2026 License: AGPL-3.0 Imports: 0 Imported by: 0

Documentation

Overview

Package models provides core data structures for SQL tokenization and parsing in GoSQLX v1.6.0.

This package contains the fundamental types used throughout the GoSQLX library for representing SQL tokens, their locations in source code, and tokenization errors. All types are designed with zero-copy operations and object pooling in mind for optimal performance.

Core Components

The package is organized into several key areas:

  • Token Types: Token, TokenType, Word, Keyword for representing lexical units
  • Location Tracking: Location, Span for precise error reporting with line/column information
  • Token Wrappers: TokenWithSpan for tokens with position information
  • Error Types: TokenizerError for tokenization failures
  • Helper Functions: Factory functions for creating tokens efficiently

Performance Characteristics

GoSQLX v1.6.0 achieves exceptional performance metrics:

  • Tokenization: 1.38M+ operations/second sustained, 1.5M peak throughput
  • Memory Efficiency: 60-80% reduction via object pooling
  • Zero-Copy: Direct byte slice operations without string allocation
  • Thread-Safe: All operations are race-free and goroutine-safe
  • Test Coverage: 100% code coverage with comprehensive test suite

Token Type System

The TokenType system supports v1.6.0 features including:

  • PostgreSQL Extensions: JSON/JSONB operators (->/->>/#>/#>>/@>/<@/?/?|/?&/#-), LATERAL, RETURNING
  • SQL-99 Standards: Window functions, CTEs, GROUPING SETS, ROLLUP, CUBE
  • SQL:2003 Features: MERGE statements, FILTER clause, FETCH FIRST/NEXT
  • Multi-Dialect: PostgreSQL, MySQL, SQL Server, Oracle, SQLite keywords

Token types are organized into ranges for efficient categorization:

  • Basic tokens (10-29): WORD, NUMBER, IDENTIFIER, PLACEHOLDER
  • String literals (30-49): Single/double quoted, dollar quoted, hex strings
  • Operators (50-149): Arithmetic, comparison, JSON/JSONB operators
  • Keywords (200-499): SQL keywords organized by category

Location Tracking

Location and Span provide precise position information for error reporting:

  • 1-based indexing for line and column numbers (SQL standard)
  • Line numbers start at 1, column numbers start at 1
  • Spans represent ranges from start to end locations
  • Used extensively in error messages and IDE integration

Usage Examples

Creating tokens with location information:

loc := models.Location{Line: 1, Column: 5}
token := models.NewTokenWithSpan(
    models.TokenTypeSelect,
    "SELECT",
    loc,
    models.Location{Line: 1, Column: 11},
)

Working with token types:

if tokenType.IsKeyword() {
    // Handle SQL keyword
}
if tokenType.IsOperator() {
    // Handle operator
}
if tokenType.IsDMLKeyword() {
    // Handle SELECT, INSERT, UPDATE, DELETE
}

Checking for specific token categories:

// Check for window function keywords
if tokenType.IsWindowKeyword() {
    // Handle OVER, PARTITION BY, ROWS, RANGE, etc.
}

// Check for PostgreSQL JSON operators
switch tokenType {
case models.TokenTypeArrow:         // ->
case models.TokenTypeLongArrow:     // ->>
case models.TokenTypeHashArrow:     // #>
case models.TokenTypeHashLongArrow: // #>>
    // Handle JSON field access
}

Creating error locations:

err := models.TokenizerError{
    Message:  "unexpected character '@'",
    Location: models.Location{Line: 2, Column: 15},
}

PostgreSQL v1.6.0 Features

New token types for PostgreSQL extensions:

  • TokenTypeLateral: LATERAL JOIN support for correlated subqueries
  • TokenTypeReturning: RETURNING clause for INSERT/UPDATE/DELETE
  • TokenTypeArrow, TokenTypeLongArrow: -> and ->> JSON operators
  • TokenTypeHashArrow, TokenTypeHashLongArrow: #> and #>> path operators
  • TokenTypeAtArrow, TokenTypeArrowAt: @> contains and <@ is-contained-by
  • TokenTypeHashMinus: #- delete at path operator
  • TokenTypeAtQuestion: @? JSON path query
  • TokenTypeQuestionAnd, TokenTypeQuestionPipe: ?& and ?| key existence

SQL Standards Support

SQL-99 (Core + Extensions):

  • Window Functions: OVER, PARTITION BY, ROWS, RANGE, frame clauses
  • CTEs: WITH, RECURSIVE for common table expressions
  • Set Operations: UNION, INTERSECT, EXCEPT with ALL modifier
  • GROUPING SETS: ROLLUP, CUBE for multi-dimensional aggregation
  • Analytic Functions: ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD

SQL:2003 Features:

  • MERGE Statements: MERGE INTO with MATCHED/NOT MATCHED
  • FILTER Clause: Conditional aggregation in window functions
  • FETCH FIRST/NEXT: Standard limit syntax with TIES support
  • Materialized Views: CREATE MATERIALIZED VIEW, REFRESH

Thread Safety

All types in this package are immutable value types and safe for concurrent use:

  • Token, TokenType, Location, Span are all value types
  • No shared mutable state
  • Safe to pass between goroutines
  • Used extensively with object pooling (sync.Pool)

Integration with Parser

The models package integrates seamlessly with the parser:

// Tokenize SQL
tkz := tokenizer.GetTokenizer()
defer tokenizer.PutTokenizer(tkz)
tokens, err := tkz.Tokenize([]byte(sql))
if err != nil {
    if tokErr, ok := err.(models.TokenizerError); ok {
        // Access error location: tokErr.Location.Line, tokErr.Location.Column
    }
}

// Parse tokens
ast, parseErr := parser.Parse(tokens)
if parseErr != nil {
    // Parser errors include location information
}

Design Philosophy

The models package follows GoSQLX design principles:

  • Zero Dependencies: Only depends on Go standard library
  • Value Types: Immutable structs for safety and performance
  • Explicit Ranges: Token type ranges for O(1) categorization
  • 1-Based Indexing: Matches SQL and editor conventions
  • Clear Semantics: Descriptive names and comprehensive documentation

Testing and Quality

The package maintains exceptional quality standards:

  • 100% Test Coverage: All code paths tested
  • Race Detection: No race conditions (go test -race)
  • Benchmarks: Performance validation for all operations
  • Property Testing: Extensive edge case validation
  • Real-World SQL: Validated against 115+ production queries

For complete examples and advanced usage, see:

  • docs/GETTING_STARTED.md - Quick start guide
  • docs/USAGE_GUIDE.md - Comprehensive usage documentation
  • examples/ directory - Production-ready examples

Package models provides core data structures for SQL tokenization and parsing, including tokens, spans, locations, and error types.

This package is the foundation of GoSQLX v1.6.0, providing high-performance, zero-copy token types with comprehensive PostgreSQL and SQL standard support.

See doc.go for complete package documentation and examples.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Keyword

type Keyword struct {
	Word     string // The actual keyword text
	Reserved bool   // Whether this is a reserved keyword
}

Keyword represents a lexical keyword with its properties.

Keywords are SQL reserved words or dialect-specific keywords that have special meaning in SQL syntax. GoSQLX supports keywords from multiple SQL dialects: PostgreSQL, MySQL, SQL Server, Oracle, and SQLite.

Fields:

  • Word: The keyword text in uppercase (canonical form)
  • Reserved: True if this is a reserved keyword that cannot be used as an identifier

Example:

// Reserved keyword
kw := &models.Keyword{Word: "SELECT", Reserved: true}

// Non-reserved keyword
kw := &models.Keyword{Word: "RETURNING", Reserved: false}

v1.6.0 adds support for PostgreSQL-specific keywords:

  • LATERAL: Correlated subqueries in FROM clause
  • RETURNING: Return modified rows from INSERT/UPDATE/DELETE
  • FILTER: Conditional aggregation in window functions

type Location

type Location struct {
	Line   int // Line number (1-based)
	Column int // Column number (1-based)
}

Location represents a position in the source code using 1-based indexing.

Location is used throughout GoSQLX for precise error reporting and IDE integration. Both Line and Column use 1-based indexing to match SQL standards and editor conventions.

Fields:

  • Line: Line number in source code (starts at 1)
  • Column: Column number within the line (starts at 1)

Example:

loc := models.Location{Line: 5, Column: 20}
// Represents position: line 5, column 20 (5th line, 20th character)

Usage in error reporting:

err := errors.NewError(
    errors.ErrCodeUnexpectedToken,
    "unexpected token",
    models.Location{Line: 1, Column: 15},
)

Integration with LSP (Language Server Protocol):

// Convert to LSP Position (0-based)
lspPos := lsp.Position{
    Line:      location.Line - 1,      // Convert to 0-based
    Character: location.Column - 1,    // Convert to 0-based
}

Performance: Location is a lightweight value type (2 ints) that is stack-allocated and has no memory overhead.

type Span

type Span struct {
	Start Location // Start of the span (inclusive)
	End   Location // End of the span (exclusive)
}

Span represents a range in the source code.

Span defines a contiguous region of source code from a Start location to an End location. Used for highlighting ranges in error messages, LSP diagnostics, and code formatting.

Fields:

  • Start: Beginning location of the span (inclusive)
  • End: Ending location of the span (exclusive)

Example:

span := models.Span{
    Start: models.Location{Line: 1, Column: 1},
    End:   models.Location{Line: 1, Column: 7},
}
// Represents "SELECT" token spanning columns 1-6 on line 1

Usage with TokenWithSpan:

token := models.TokenWithSpan{
    Token: models.Token{Type: models.TokenTypeSelect, Value: "SELECT"},
    Start: models.Location{Line: 1, Column: 1},
    End:   models.Location{Line: 1, Column: 7},
}

Helper functions:

span := models.NewSpan(startLoc, endLoc)  // Create new span
emptySpan := models.EmptySpan()            // Create empty span

func EmptySpan

func EmptySpan() Span

EmptySpan returns an empty span with zero values.

Used as a default/placeholder when span information is not available.

Example:

span := models.EmptySpan()
// Equivalent to: Span{Start: Location{}, End: Location{}}

func NewSpan

func NewSpan(start, end Location) Span

NewSpan creates a new span from start to end locations.

Parameters:

  • start: Beginning location (inclusive)
  • end: Ending location (exclusive)

Returns a Span covering the range [start, end).

Example:

start := models.Location{Line: 1, Column: 1}
end := models.Location{Line: 1, Column: 7}
span := models.NewSpan(start, end)

type Token

type Token struct {
	Type  TokenType
	Value string
	Word  *Word // For TokenTypeWord
	Long  bool  // For TokenTypeNumber to indicate if it's a long number
	Quote rune  // For quoted strings and identifiers
}

Token represents a SQL token with its value and metadata.

Token is the fundamental unit of lexical analysis in GoSQLX. Each token represents a meaningful element in SQL source code: keywords, identifiers, operators, literals, or punctuation.

Tokens are lightweight value types designed for use with object pooling and zero-copy operations. They are immutable and safe for concurrent use.

Fields:

  • Type: The token category (keyword, operator, literal, etc.)
  • Value: The string representation of the token
  • Word: Optional Word struct for keyword/identifier tokens
  • Long: Flag for numeric tokens indicating long integer (int64)
  • Quote: Quote character used for quoted strings/identifiers (' or ")

Example usage:

token := models.Token{
    Type:  models.TokenTypeSelect,
    Value: "SELECT",
}

// Check token category
if token.Type.IsKeyword() {
    fmt.Println("Found SQL keyword:", token.Value)
}

Performance: Tokens are stack-allocated value types with minimal memory overhead. Used extensively with sync.Pool for zero-allocation parsing in hot paths.

func NewToken

func NewToken(tokenType TokenType, value string) Token

NewToken creates a new Token with the given type and value.

Factory function for creating tokens without location information. Useful for testing, manual token construction, or scenarios where position tracking is not needed.

Parameters:

  • tokenType: The TokenType classification
  • value: The string representation of the token

Returns a Token with the specified type and value.

Example:

token := models.NewToken(models.TokenTypeSelect, "SELECT")
// token.Type = TokenTypeSelect, token.Value = "SELECT"

numToken := models.NewToken(models.TokenTypeNumber, "42")
// numToken.Type = TokenTypeNumber, numToken.Value = "42"

type TokenType

type TokenType int

TokenType represents the type of a SQL token.

TokenType is the core classification system for all lexical units in SQL. GoSQLX v1.6.0 supports 500+ distinct token types organized into logical ranges for efficient categorization and type checking.

Token Type Organization:

  • Special (0-9): EOF, UNKNOWN
  • Basic (10-29): WORD, NUMBER, IDENTIFIER, PLACEHOLDER
  • Strings (30-49): Various string literal formats
  • Operators (50-149): Arithmetic, comparison, JSON/JSONB operators
  • Keywords (200-499): SQL keywords by category
  • Data Types (430-449): SQL data type keywords

v1.6.0 PostgreSQL Extensions:

  • JSON/JSONB Operators: ->, ->>, #>, #>>, @>, <@, #-, @?, @@, ?&, ?|
  • LATERAL: Correlated subqueries in FROM clause
  • RETURNING: Return modified rows from DML statements
  • FILTER: Conditional aggregation in window functions
  • DISTINCT ON: PostgreSQL-specific row selection

Performance: TokenType is an int with O(1) lookup via range checking. All Is* methods use constant-time comparisons.

Example usage:

// Check token category
if tokenType.IsKeyword() {
    // Handle SQL keyword
}
if tokenType.IsOperator() {
    // Handle operator (+, -, *, /, ->, etc.)
}

// Check specific categories
if tokenType.IsWindowKeyword() {
    // Handle OVER, PARTITION BY, ROWS, RANGE
}
if tokenType.IsDMLKeyword() {
    // Handle SELECT, INSERT, UPDATE, DELETE
}

// PostgreSQL JSON operators
switch tokenType {
case TokenTypeArrow:      // -> (JSON field access)
case TokenTypeLongArrow:  // ->> (JSON field as text)
    // Handle JSON operations
}
const (
	// TokenRangeBasicStart marks the beginning of basic token types
	TokenRangeBasicStart TokenType = 10
	// TokenRangeBasicEnd marks the end of basic token types (exclusive)
	TokenRangeBasicEnd TokenType = 30

	// TokenRangeStringStart marks the beginning of string literal types
	TokenRangeStringStart TokenType = 30
	// TokenRangeStringEnd marks the end of string literal types (exclusive)
	TokenRangeStringEnd TokenType = 50

	// TokenRangeOperatorStart marks the beginning of operator types
	TokenRangeOperatorStart TokenType = 50
	// TokenRangeOperatorEnd marks the end of operator types (exclusive)
	TokenRangeOperatorEnd TokenType = 150

	// TokenRangeKeywordStart marks the beginning of SQL keyword types
	TokenRangeKeywordStart TokenType = 200
	// TokenRangeKeywordEnd marks the end of SQL keyword types (exclusive)
	TokenRangeKeywordEnd TokenType = 500

	// TokenRangeDataTypeStart marks the beginning of data type keywords
	TokenRangeDataTypeStart TokenType = 430
	// TokenRangeDataTypeEnd marks the end of data type keywords (exclusive)
	TokenRangeDataTypeEnd TokenType = 450
)

Token range constants for maintainability and clarity. These define the boundaries for each category of tokens.

const (
	// Special tokens
	TokenTypeEOF     TokenType = 0
	TokenTypeUnknown TokenType = 1

	// Basic token types (10-29)
	TokenTypeWord        TokenType = 10
	TokenTypeNumber      TokenType = 11
	TokenTypeChar        TokenType = 12
	TokenTypeWhitespace  TokenType = 13
	TokenTypeIdentifier  TokenType = 14
	TokenTypePlaceholder TokenType = 15

	// String literals (30-49)
	TokenTypeString                   TokenType = 30 // Generic string type
	TokenTypeSingleQuotedString       TokenType = 31
	TokenTypeDoubleQuotedString       TokenType = 32
	TokenTypeTripleSingleQuotedString TokenType = 33
	TokenTypeTripleDoubleQuotedString TokenType = 34
	TokenTypeDollarQuotedString       TokenType = 35
	TokenTypeByteStringLiteral        TokenType = 36
	TokenTypeNationalStringLiteral    TokenType = 37
	TokenTypeEscapedStringLiteral     TokenType = 38
	TokenTypeUnicodeStringLiteral     TokenType = 39
	TokenTypeHexStringLiteral         TokenType = 40

	// Operators and punctuation (50-99)
	TokenTypeOperator        TokenType = 50 // Generic operator
	TokenTypeComma           TokenType = 51
	TokenTypeEq              TokenType = 52
	TokenTypeDoubleEq        TokenType = 53
	TokenTypeNeq             TokenType = 54
	TokenTypeLt              TokenType = 55
	TokenTypeGt              TokenType = 56
	TokenTypeLtEq            TokenType = 57
	TokenTypeGtEq            TokenType = 58
	TokenTypeSpaceship       TokenType = 59
	TokenTypePlus            TokenType = 60
	TokenTypeMinus           TokenType = 61
	TokenTypeMul             TokenType = 62
	TokenTypeDiv             TokenType = 63
	TokenTypeDuckIntDiv      TokenType = 64
	TokenTypeMod             TokenType = 65
	TokenTypeStringConcat    TokenType = 66
	TokenTypeLParen          TokenType = 67
	TokenTypeLeftParen       TokenType = 67 // Alias for compatibility
	TokenTypeRParen          TokenType = 68
	TokenTypeRightParen      TokenType = 68 // Alias for compatibility
	TokenTypePeriod          TokenType = 69
	TokenTypeDot             TokenType = 69 // Alias for compatibility
	TokenTypeColon           TokenType = 70
	TokenTypeDoubleColon     TokenType = 71
	TokenTypeAssignment      TokenType = 72
	TokenTypeSemicolon       TokenType = 73
	TokenTypeBackslash       TokenType = 74
	TokenTypeLBracket        TokenType = 75
	TokenTypeRBracket        TokenType = 76
	TokenTypeAmpersand       TokenType = 77
	TokenTypePipe            TokenType = 78
	TokenTypeCaret           TokenType = 79
	TokenTypeLBrace          TokenType = 80
	TokenTypeRBrace          TokenType = 81
	TokenTypeRArrow          TokenType = 82
	TokenTypeSharp           TokenType = 83
	TokenTypeTilde           TokenType = 84
	TokenTypeExclamationMark TokenType = 85
	TokenTypeAtSign          TokenType = 86
	TokenTypeQuestion        TokenType = 87

	// Compound operators (100-149)
	TokenTypeTildeAsterisk                      TokenType = 100
	TokenTypeExclamationMarkTilde               TokenType = 101
	TokenTypeExclamationMarkTildeAsterisk       TokenType = 102
	TokenTypeDoubleTilde                        TokenType = 103
	TokenTypeDoubleTildeAsterisk                TokenType = 104
	TokenTypeExclamationMarkDoubleTilde         TokenType = 105
	TokenTypeExclamationMarkDoubleTildeAsterisk TokenType = 106
	TokenTypeShiftLeft                          TokenType = 107
	TokenTypeShiftRight                         TokenType = 108
	TokenTypeOverlap                            TokenType = 109
	TokenTypeDoubleExclamationMark              TokenType = 110
	TokenTypeCaretAt                            TokenType = 111
	TokenTypePGSquareRoot                       TokenType = 112
	TokenTypePGCubeRoot                         TokenType = 113
	// JSON/JSONB operators (PostgreSQL)
	TokenTypeArrow                TokenType = 114 // -> JSON field access (returns JSON)
	TokenTypeLongArrow            TokenType = 115 // ->> JSON field access (returns text)
	TokenTypeHashArrow            TokenType = 116 // #> JSON path access (returns JSON)
	TokenTypeHashLongArrow        TokenType = 117 // #>> JSON path access (returns text)
	TokenTypeAtArrow              TokenType = 118 // @> JSON contains
	TokenTypeArrowAt              TokenType = 119 // <@ JSON is contained by
	TokenTypeHashMinus            TokenType = 120 // #- Delete at JSON path
	TokenTypeAtQuestion           TokenType = 121 // @? JSON path query
	TokenTypeAtAt                 TokenType = 122 // @@ Full text search
	TokenTypeQuestionAnd          TokenType = 123 // ?& JSON key exists all
	TokenTypeQuestionPipe         TokenType = 124 // ?| JSON key exists any
	TokenTypeCustomBinaryOperator TokenType = 125

	// SQL Keywords (200-399)
	TokenTypeKeyword TokenType = 200 // Generic keyword
	TokenTypeSelect  TokenType = 201
	TokenTypeFrom    TokenType = 202
	TokenTypeWhere   TokenType = 203
	TokenTypeJoin    TokenType = 204
	TokenTypeInner   TokenType = 205
	TokenTypeLeft    TokenType = 206
	TokenTypeRight   TokenType = 207
	TokenTypeOuter   TokenType = 208
	TokenTypeOn      TokenType = 209
	TokenTypeAs      TokenType = 210
	TokenTypeAnd     TokenType = 211
	TokenTypeOr      TokenType = 212
	TokenTypeNot     TokenType = 213
	TokenTypeIn      TokenType = 214
	TokenTypeLike    TokenType = 215
	TokenTypeBetween TokenType = 216
	TokenTypeIs      TokenType = 217
	TokenTypeNull    TokenType = 218
	TokenTypeTrue    TokenType = 219
	TokenTypeFalse   TokenType = 220
	TokenTypeCase    TokenType = 221
	TokenTypeWhen    TokenType = 222
	TokenTypeThen    TokenType = 223
	TokenTypeElse    TokenType = 224
	TokenTypeEnd     TokenType = 225
	TokenTypeGroup   TokenType = 226
	TokenTypeBy      TokenType = 227
	TokenTypeHaving  TokenType = 228
	TokenTypeOrder   TokenType = 229
	TokenTypeAsc     TokenType = 230
	TokenTypeDesc    TokenType = 231
	TokenTypeLimit   TokenType = 232
	TokenTypeOffset  TokenType = 233

	// DML Keywords (234-239)
	TokenTypeInsert TokenType = 234
	TokenTypeUpdate TokenType = 235
	TokenTypeDelete TokenType = 236
	TokenTypeInto   TokenType = 237
	TokenTypeValues TokenType = 238
	TokenTypeSet    TokenType = 239

	// DDL Keywords (240-249)
	TokenTypeCreate   TokenType = 240
	TokenTypeAlter    TokenType = 241
	TokenTypeDrop     TokenType = 242
	TokenTypeTable    TokenType = 243
	TokenTypeIndex    TokenType = 244
	TokenTypeView     TokenType = 245
	TokenTypeColumn   TokenType = 246
	TokenTypeDatabase TokenType = 247
	TokenTypeSchema   TokenType = 248
	TokenTypeTrigger  TokenType = 249

	// Aggregate functions (250-269)
	TokenTypeCount TokenType = 250
	TokenTypeSum   TokenType = 251
	TokenTypeAvg   TokenType = 252
	TokenTypeMin   TokenType = 253
	TokenTypeMax   TokenType = 254

	// Compound keywords (270-279)
	TokenTypeGroupBy   TokenType = 270
	TokenTypeOrderBy   TokenType = 271
	TokenTypeLeftJoin  TokenType = 272
	TokenTypeRightJoin TokenType = 273
	TokenTypeInnerJoin TokenType = 274
	TokenTypeOuterJoin TokenType = 275
	TokenTypeFullJoin  TokenType = 276
	TokenTypeCrossJoin TokenType = 277

	// CTE and Set Operations (280-299)
	TokenTypeWith      TokenType = 280
	TokenTypeRecursive TokenType = 281
	TokenTypeUnion     TokenType = 282
	TokenTypeExcept    TokenType = 283
	TokenTypeIntersect TokenType = 284
	TokenTypeAll       TokenType = 285

	// Window Function Keywords (300-319)
	TokenTypeOver      TokenType = 300
	TokenTypePartition TokenType = 301
	TokenTypeRows      TokenType = 302
	TokenTypeRange     TokenType = 303
	TokenTypeUnbounded TokenType = 304
	TokenTypePreceding TokenType = 305
	TokenTypeFollowing TokenType = 306
	TokenTypeCurrent   TokenType = 307
	TokenTypeRow       TokenType = 308
	TokenTypeGroups    TokenType = 309
	TokenTypeFilter    TokenType = 310
	TokenTypeExclude   TokenType = 311

	// Additional Join Keywords (320-329)
	TokenTypeCross   TokenType = 320
	TokenTypeNatural TokenType = 321
	TokenTypeFull    TokenType = 322
	TokenTypeUsing   TokenType = 323
	TokenTypeLateral TokenType = 324 // LATERAL keyword for correlated subqueries in FROM clause

	// Constraint Keywords (330-349)
	TokenTypePrimary       TokenType = 330
	TokenTypeKey           TokenType = 331
	TokenTypeForeign       TokenType = 332
	TokenTypeReferences    TokenType = 333
	TokenTypeUnique        TokenType = 334
	TokenTypeCheck         TokenType = 335
	TokenTypeDefault       TokenType = 336
	TokenTypeAutoIncrement TokenType = 337
	TokenTypeConstraint    TokenType = 338
	TokenTypeNotNull       TokenType = 339
	TokenTypeNullable      TokenType = 340

	// Additional SQL Keywords (350-399)
	TokenTypeDistinct TokenType = 350
	TokenTypeExists   TokenType = 351
	TokenTypeAny      TokenType = 352
	TokenTypeSome     TokenType = 353
	TokenTypeCast     TokenType = 354
	TokenTypeConvert  TokenType = 355
	TokenTypeCollate  TokenType = 356
	TokenTypeCascade  TokenType = 357
	TokenTypeRestrict TokenType = 358
	TokenTypeReplace  TokenType = 359
	TokenTypeRename   TokenType = 360
	TokenTypeTo       TokenType = 361
	TokenTypeIf       TokenType = 362
	TokenTypeOnly     TokenType = 363
	TokenTypeFor      TokenType = 364
	TokenTypeNulls    TokenType = 365
	TokenTypeFirst    TokenType = 366
	TokenTypeLast     TokenType = 367
	TokenTypeFetch    TokenType = 368 // FETCH keyword for FETCH FIRST/NEXT clause
	TokenTypeNext     TokenType = 369 // NEXT keyword for FETCH NEXT clause

	// MERGE Statement Keywords (370-379)
	TokenTypeMerge   TokenType = 370
	TokenTypeMatched TokenType = 371
	TokenTypeTarget  TokenType = 372
	TokenTypeSource  TokenType = 373

	// Materialized View Keywords (374-379)
	TokenTypeMaterialized TokenType = 374
	TokenTypeRefresh      TokenType = 375
	TokenTypeTies         TokenType = 376 // TIES keyword for WITH TIES in FETCH clause
	TokenTypePercent      TokenType = 377 // PERCENT keyword for FETCH ... PERCENT ROWS
	TokenTypeTruncate     TokenType = 378 // TRUNCATE keyword for TRUNCATE TABLE statement
	TokenTypeReturning    TokenType = 379 // RETURNING keyword for PostgreSQL RETURNING clause

	// Row Locking Keywords (380-389)
	TokenTypeShare  TokenType = 380 // SHARE keyword for FOR SHARE row locking
	TokenTypeNoWait TokenType = 381 // NOWAIT keyword for FOR UPDATE/SHARE NOWAIT
	TokenTypeSkip   TokenType = 382 // SKIP keyword for FOR UPDATE SKIP LOCKED
	TokenTypeLocked TokenType = 383 // LOCKED keyword for SKIP LOCKED
	TokenTypeOf     TokenType = 384 // OF keyword for FOR UPDATE OF table_name

	// Grouping Set Keywords (390-399)
	TokenTypeGroupingSets TokenType = 390
	TokenTypeRollup       TokenType = 391
	TokenTypeCube         TokenType = 392
	TokenTypeGrouping     TokenType = 393
	TokenTypeSets         TokenType = 394 // SETS keyword for GROUPING SETS
	TokenTypeArray        TokenType = 395 // ARRAY keyword for PostgreSQL array constructor
	TokenTypeWithin       TokenType = 396 // WITHIN keyword for WITHIN GROUP clause

	// Role/Permission Keywords (400-419)
	TokenTypeRole       TokenType = 400
	TokenTypeUser       TokenType = 401
	TokenTypeGrant      TokenType = 402
	TokenTypeRevoke     TokenType = 403
	TokenTypePrivilege  TokenType = 404
	TokenTypePassword   TokenType = 405
	TokenTypeLogin      TokenType = 406
	TokenTypeSuperuser  TokenType = 407
	TokenTypeCreateDB   TokenType = 408
	TokenTypeCreateRole TokenType = 409

	// Transaction Keywords (420-429)
	TokenTypeBegin     TokenType = 420
	TokenTypeCommit    TokenType = 421
	TokenTypeRollback  TokenType = 422
	TokenTypeSavepoint TokenType = 423

	// Data Type Keywords (430-449)
	TokenTypeInt          TokenType = 430
	TokenTypeInteger      TokenType = 431
	TokenTypeBigInt       TokenType = 432
	TokenTypeSmallInt     TokenType = 433
	TokenTypeFloat        TokenType = 434
	TokenTypeDouble       TokenType = 435
	TokenTypeDecimal      TokenType = 436
	TokenTypeNumeric      TokenType = 437
	TokenTypeVarchar      TokenType = 438
	TokenTypeCharDataType TokenType = 439 // Char as data type (TokenTypeChar=12 is for single char token)
	TokenTypeText         TokenType = 440
	TokenTypeBoolean      TokenType = 441
	TokenTypeDate         TokenType = 442
	TokenTypeTime         TokenType = 443
	TokenTypeTimestamp    TokenType = 444
	TokenTypeInterval     TokenType = 445
	TokenTypeBlob         TokenType = 446
	TokenTypeClob         TokenType = 447
	TokenTypeJson         TokenType = 448
	TokenTypeUuid         TokenType = 449

	// Special Token Types (500-509)
	TokenTypeIllegal    TokenType = 500 // For parser compatibility with token.ILLEGAL
	TokenTypeAsterisk   TokenType = 501 // Explicit asterisk token type
	TokenTypeDoublePipe TokenType = 502 // || concatenation operator
)

Token type constants with explicit values to avoid collisions

func (TokenType) IsAggregateFunction added in v1.6.0

func (t TokenType) IsAggregateFunction() bool

IsAggregateFunction returns true if the token type is an aggregate function

func (TokenType) IsConstraint added in v1.6.0

func (t TokenType) IsConstraint() bool

IsConstraint returns true if the token type is a constraint keyword

func (TokenType) IsDDLKeyword added in v1.6.0

func (t TokenType) IsDDLKeyword() bool

IsDDLKeyword returns true if the token type is a DDL keyword

func (TokenType) IsDMLKeyword added in v1.6.0

func (t TokenType) IsDMLKeyword() bool

IsDMLKeyword returns true if the token type is a DML keyword

func (TokenType) IsDataType added in v1.6.0

func (t TokenType) IsDataType() bool

IsDataType returns true if the token type is a SQL data type. Uses range-based checking for O(1) performance.

Example:

if token.ModelType.IsDataType() {
    // Handle data type token (INT, VARCHAR, BOOLEAN, etc.)
}

func (TokenType) IsJoinKeyword added in v1.6.0

func (t TokenType) IsJoinKeyword() bool

IsJoinKeyword returns true if the token type is a JOIN-related keyword

func (TokenType) IsKeyword added in v1.6.0

func (t TokenType) IsKeyword() bool

IsKeyword returns true if the token type is a SQL keyword. Uses range-based checking for O(1) performance (~0.24ns/op).

Example:

if token.ModelType.IsKeyword() {
    // Handle SQL keyword token
}

func (TokenType) IsLiteral added in v1.6.0

func (t TokenType) IsLiteral() bool

IsLiteral returns true if the token type is a literal value. Includes identifiers, numbers, strings, and boolean/null literals.

Example:

if token.ModelType.IsLiteral() {
    // Handle literal value (identifier, number, string, true/false/null)
}

func (TokenType) IsOperator added in v1.6.0

func (t TokenType) IsOperator() bool

IsOperator returns true if the token type is an operator. Uses range-based checking for O(1) performance.

Example:

if token.ModelType.IsOperator() {
    // Handle operator token (e.g., +, -, *, /, etc.)
}

func (TokenType) IsSetOperation added in v1.6.0

func (t TokenType) IsSetOperation() bool

IsSetOperation returns true if the token type is a set operation

func (TokenType) IsWindowKeyword added in v1.6.0

func (t TokenType) IsWindowKeyword() bool

IsWindowKeyword returns true if the token type is a window function keyword

func (TokenType) String added in v1.0.1

func (t TokenType) String() string

String returns a string representation of the token type.

Provides human-readable names for debugging, error messages, and logging. Uses O(1) map lookup for fast conversion.

Example:

tokenType := models.TokenTypeSelect
fmt.Println(tokenType.String()) // Output: "SELECT"

tokenType = models.TokenTypeLongArrow
fmt.Println(tokenType.String()) // Output: "LONG_ARROW"

type TokenWithSpan

type TokenWithSpan struct {
	Token Token    // The token with type and value
	Start Location // Start position (inclusive)
	End   Location // End position (exclusive)
}

TokenWithSpan represents a token with its location in the source code.

TokenWithSpan combines a Token with precise position information (Start and End locations). This is the primary representation used by the tokenizer output and consumed by the parser.

Fields:

  • Token: The token itself (type, value, metadata)
  • Start: Beginning location of the token in source (inclusive)
  • End: Ending location of the token in source (exclusive)

Example:

// Token for "SELECT" at line 1, columns 1-7
tokenWithSpan := models.TokenWithSpan{
    Token: models.Token{Type: models.TokenTypeSelect, Value: "SELECT"},
    Start: models.Location{Line: 1, Column: 1},
    End:   models.Location{Line: 1, Column: 7},
}

Usage with tokenizer:

tkz := tokenizer.GetTokenizer()
defer tokenizer.PutTokenizer(tkz)
tokens, err := tkz.Tokenize([]byte(sql))
// tokens is []TokenWithSpan with location information
for _, t := range tokens {
    fmt.Printf("Token %s at line %d, column %d\n",
        t.Token.Value, t.Start.Line, t.Start.Column)
}

Used for error reporting:

// Create error at token location
err := errors.NewError(
    errors.ErrCodeUnexpectedToken,
    "unexpected token",
    tokenWithSpan.Start,
)

Performance: TokenWithSpan is a value type designed for zero-copy operations. The tokenizer returns slices of TokenWithSpan without heap allocations.

func NewEOFToken

func NewEOFToken(pos Location) TokenWithSpan

NewEOFToken creates a new EOF token with span.

Factory function for creating End-Of-File tokens. EOF tokens mark the end of the input stream and are essential for parser termination.

Parameters:

  • pos: The location where EOF was encountered

Returns a TokenWithSpan with type TokenTypeEOF and empty value. Both Start and End are set to the same position.

Example:

eofToken := models.NewEOFToken(models.Location{Line: 10, Column: 1})
// eofToken.Token.Type = TokenTypeEOF
// eofToken.Token.Value = ""
// eofToken.Start = eofToken.End = {Line: 10, Column: 1}

Used by tokenizer at end of input:

tokens = append(tokens, models.NewEOFToken(currentLocation))

func NewTokenWithSpan

func NewTokenWithSpan(tokenType TokenType, value string, start, end Location) TokenWithSpan

NewTokenWithSpan creates a new TokenWithSpan with the given type, value, and location.

Factory function for creating tokens with precise position information. This is the primary way to create tokens during tokenization.

Parameters:

  • tokenType: The TokenType classification
  • value: The string representation of the token
  • start: Beginning location in source (inclusive)
  • end: Ending location in source (exclusive)

Returns a TokenWithSpan with all fields populated.

Example:

token := models.NewTokenWithSpan(
    models.TokenTypeSelect,
    "SELECT",
    models.Location{Line: 1, Column: 1},
    models.Location{Line: 1, Column: 7},
)
// Represents "SELECT" spanning columns 1-6 on line 1

Used by tokenizer:

tokens = append(tokens, models.NewTokenWithSpan(
    tokenType, value, startLoc, endLoc,
))

func TokenAtLocation

func TokenAtLocation(token Token, start, end Location) TokenWithSpan

TokenAtLocation creates a new TokenWithSpan from a Token and location.

Convenience function for adding location information to an existing Token. Useful when token is created first and location is determined later.

Parameters:

  • token: The Token to wrap with location
  • start: Beginning location in source (inclusive)
  • end: Ending location in source (exclusive)

Returns a TokenWithSpan combining the token and location.

Example:

token := models.NewToken(models.TokenTypeSelect, "SELECT")
start := models.Location{Line: 1, Column: 1}
end := models.Location{Line: 1, Column: 7}
tokenWithSpan := models.TokenAtLocation(token, start, end)

func WrapToken

func WrapToken(token Token) TokenWithSpan

WrapToken wraps a token with an empty location.

Creates a TokenWithSpan from a Token when location information is not available or not needed. The Start and End locations are set to zero values.

Example:

token := models.Token{Type: models.TokenTypeSelect, Value: "SELECT"}
wrapped := models.WrapToken(token)
// wrapped.Start and wrapped.End are both Location{Line: 0, Column: 0}

Use case: Testing or scenarios where location tracking is not required.

type TokenizerError

type TokenizerError struct {
	Message  string   // Error description
	Location Location // Where the error occurred
}

TokenizerError represents an error during tokenization.

TokenizerError is a simple error type for lexical analysis failures. It includes the error message and the precise location where the error occurred.

For more sophisticated error handling with hints, suggestions, and context, use the errors package (pkg/errors) which provides structured errors with:

  • Error codes (E1xxx for tokenizer errors)
  • SQL context extraction and highlighting
  • Intelligent suggestions and typo detection
  • Help URLs for documentation

Fields:

  • Message: Human-readable error description
  • Location: Precise position in source where error occurred (line/column)

Example:

err := models.TokenizerError{
    Message:  "unexpected character '@' at position",
    Location: models.Location{Line: 2, Column: 15},
}
fmt.Println(err.Error()) // "unexpected character '@' at position"

Upgrading to structured errors:

// Instead of TokenizerError, use errors package:
err := errors.UnexpectedCharError('@', location, sqlSource)
// Provides: error code, context, hints, help URL

Common tokenizer errors:

  • Unexpected characters in input
  • Unterminated string literals
  • Invalid numeric formats
  • Invalid identifier syntax
  • Input size limits exceeded (DoS protection)

Performance: TokenizerError is a lightweight value type with minimal overhead.

func (TokenizerError) Error

func (e TokenizerError) Error() string

Error implements the error interface.

Returns the error message. For full context and location information, use the errors package which provides FormatErrorWithContext.

Example:

err := models.TokenizerError{Message: "invalid token", Location: loc}
fmt.Println(err.Error()) // Output: "invalid token"

type Whitespace

type Whitespace struct {
	Type    WhitespaceType
	Content string // For comments
	Prefix  string // For single line comments
}

Whitespace represents different types of whitespace tokens.

Whitespace tokens are typically ignored during parsing but can be preserved for formatting tools, SQL formatters, or LSP servers that need to maintain original source formatting and comments.

Fields:

  • Type: The specific type of whitespace (space, newline, tab, comment)
  • Content: The actual content (used for comments to preserve text)
  • Prefix: Comment prefix for single-line comments (-- or # in MySQL)

Example:

// Single-line comment
ws := models.Whitespace{
    Type:    models.WhitespaceTypeSingleLineComment,
    Content: "This is a comment",
    Prefix:  "--",
}

// Multi-line comment
ws := models.Whitespace{
    Type:    models.WhitespaceTypeMultiLineComment,
    Content: "/* Block comment */",
}

type WhitespaceType

type WhitespaceType int

WhitespaceType represents the type of whitespace.

Used to distinguish between different whitespace and comment types in SQL source code for accurate formatting and comment preservation.

const (
	WhitespaceTypeSpace             WhitespaceType = iota // Regular space character
	WhitespaceTypeNewline                                 // Line break (\n or \r\n)
	WhitespaceTypeTab                                     // Tab character (\t)
	WhitespaceTypeSingleLineComment                       // Single-line comment (-- or #)
	WhitespaceTypeMultiLineComment                        // Multi-line comment (/* ... */)
)

type Word

type Word struct {
	Value      string   // The actual text value
	QuoteStyle rune     // The quote character used (if quoted)
	Keyword    *Keyword // If this word is a keyword
}

Word represents a keyword or identifier with its properties.

Word is used to distinguish between different types of word tokens: SQL keywords (SELECT, FROM, WHERE), identifiers (table/column names), and quoted identifiers ("column name" or [column name]).

Fields:

  • Value: The actual text of the word (case-preserved)
  • QuoteStyle: The quote character if this is a quoted identifier (", `, [, etc.)
  • Keyword: Pointer to Keyword struct if this word is a SQL keyword (nil for identifiers)

Example:

// SQL keyword
word := &models.Word{
    Value:   "SELECT",
    Keyword: &models.Keyword{Word: "SELECT", Reserved: true},
}

// Quoted identifier
word := &models.Word{
    Value:      "column name",
    QuoteStyle: '"',
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL