Documentation
¶
Overview ¶
Package models provides core data structures for SQL tokenization and parsing in GoSQLX v1.6.0.
This package contains the fundamental types used throughout the GoSQLX library for representing SQL tokens, their locations in source code, and tokenization errors. All types are designed with zero-copy operations and object pooling in mind for optimal performance.
Core Components ¶
The package is organized into several key areas:
- Token Types: Token, TokenType, Word, Keyword for representing lexical units
- Location Tracking: Location, Span for precise error reporting with line/column information
- Token Wrappers: TokenWithSpan for tokens with position information
- Error Types: TokenizerError for tokenization failures
- Helper Functions: Factory functions for creating tokens efficiently
Performance Characteristics ¶
GoSQLX v1.6.0 achieves exceptional performance metrics:
- Tokenization: 1.38M+ operations/second sustained, 1.5M peak throughput
- Memory Efficiency: 60-80% reduction via object pooling
- Zero-Copy: Direct byte slice operations without string allocation
- Thread-Safe: All operations are race-free and goroutine-safe
- Test Coverage: 100% code coverage with comprehensive test suite
Token Type System ¶
The TokenType system supports v1.6.0 features including:
- PostgreSQL Extensions: JSON/JSONB operators (->/->>/#>/#>>/@>/<@/?/?|/?&/#-), LATERAL, RETURNING
- SQL-99 Standards: Window functions, CTEs, GROUPING SETS, ROLLUP, CUBE
- SQL:2003 Features: MERGE statements, FILTER clause, FETCH FIRST/NEXT
- Multi-Dialect: PostgreSQL, MySQL, SQL Server, Oracle, SQLite keywords
Token types are organized into ranges for efficient categorization:
- Basic tokens (10-29): WORD, NUMBER, IDENTIFIER, PLACEHOLDER
- String literals (30-49): Single/double quoted, dollar quoted, hex strings
- Operators (50-149): Arithmetic, comparison, JSON/JSONB operators
- Keywords (200-499): SQL keywords organized by category
Location Tracking ¶
Location and Span provide precise position information for error reporting:
- 1-based indexing for line and column numbers (SQL standard)
- Line numbers start at 1, column numbers start at 1
- Spans represent ranges from start to end locations
- Used extensively in error messages and IDE integration
Usage Examples ¶
Creating tokens with location information:
loc := models.Location{Line: 1, Column: 5}
token := models.NewTokenWithSpan(
models.TokenTypeSelect,
"SELECT",
loc,
models.Location{Line: 1, Column: 11},
)
Working with token types:
if tokenType.IsKeyword() {
// Handle SQL keyword
}
if tokenType.IsOperator() {
// Handle operator
}
if tokenType.IsDMLKeyword() {
// Handle SELECT, INSERT, UPDATE, DELETE
}
Checking for specific token categories:
// Check for window function keywords
if tokenType.IsWindowKeyword() {
// Handle OVER, PARTITION BY, ROWS, RANGE, etc.
}
// Check for PostgreSQL JSON operators
switch tokenType {
case models.TokenTypeArrow: // ->
case models.TokenTypeLongArrow: // ->>
case models.TokenTypeHashArrow: // #>
case models.TokenTypeHashLongArrow: // #>>
// Handle JSON field access
}
Creating error locations:
err := models.TokenizerError{
Message: "unexpected character '@'",
Location: models.Location{Line: 2, Column: 15},
}
PostgreSQL v1.6.0 Features ¶
New token types for PostgreSQL extensions:
- TokenTypeLateral: LATERAL JOIN support for correlated subqueries
- TokenTypeReturning: RETURNING clause for INSERT/UPDATE/DELETE
- TokenTypeArrow, TokenTypeLongArrow: -> and ->> JSON operators
- TokenTypeHashArrow, TokenTypeHashLongArrow: #> and #>> path operators
- TokenTypeAtArrow, TokenTypeArrowAt: @> contains and <@ is-contained-by
- TokenTypeHashMinus: #- delete at path operator
- TokenTypeAtQuestion: @? JSON path query
- TokenTypeQuestionAnd, TokenTypeQuestionPipe: ?& and ?| key existence
SQL Standards Support ¶
SQL-99 (Core + Extensions):
- Window Functions: OVER, PARTITION BY, ROWS, RANGE, frame clauses
- CTEs: WITH, RECURSIVE for common table expressions
- Set Operations: UNION, INTERSECT, EXCEPT with ALL modifier
- GROUPING SETS: ROLLUP, CUBE for multi-dimensional aggregation
- Analytic Functions: ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD
SQL:2003 Features:
- MERGE Statements: MERGE INTO with MATCHED/NOT MATCHED
- FILTER Clause: Conditional aggregation in window functions
- FETCH FIRST/NEXT: Standard limit syntax with TIES support
- Materialized Views: CREATE MATERIALIZED VIEW, REFRESH
Thread Safety ¶
All types in this package are immutable value types and safe for concurrent use:
- Token, TokenType, Location, Span are all value types
- No shared mutable state
- Safe to pass between goroutines
- Used extensively with object pooling (sync.Pool)
Integration with Parser ¶
The models package integrates seamlessly with the parser:
// Tokenize SQL
tkz := tokenizer.GetTokenizer()
defer tokenizer.PutTokenizer(tkz)
tokens, err := tkz.Tokenize([]byte(sql))
if err != nil {
if tokErr, ok := err.(models.TokenizerError); ok {
// Access error location: tokErr.Location.Line, tokErr.Location.Column
}
}
// Parse tokens
ast, parseErr := parser.Parse(tokens)
if parseErr != nil {
// Parser errors include location information
}
Design Philosophy ¶
The models package follows GoSQLX design principles:
- Zero Dependencies: Only depends on Go standard library
- Value Types: Immutable structs for safety and performance
- Explicit Ranges: Token type ranges for O(1) categorization
- 1-Based Indexing: Matches SQL and editor conventions
- Clear Semantics: Descriptive names and comprehensive documentation
Testing and Quality ¶
The package maintains exceptional quality standards:
- 100% Test Coverage: All code paths tested
- Race Detection: No race conditions (go test -race)
- Benchmarks: Performance validation for all operations
- Property Testing: Extensive edge case validation
- Real-World SQL: Validated against 115+ production queries
For complete examples and advanced usage, see:
- docs/GETTING_STARTED.md - Quick start guide
- docs/USAGE_GUIDE.md - Comprehensive usage documentation
- examples/ directory - Production-ready examples
Package models provides core data structures for SQL tokenization and parsing, including tokens, spans, locations, and error types.
This package is the foundation of GoSQLX v1.6.0, providing high-performance, zero-copy token types with comprehensive PostgreSQL and SQL standard support.
See doc.go for complete package documentation and examples.
Index ¶
- type Keyword
- type Location
- type Span
- type Token
- type TokenType
- func (t TokenType) IsAggregateFunction() bool
- func (t TokenType) IsConstraint() bool
- func (t TokenType) IsDDLKeyword() bool
- func (t TokenType) IsDMLKeyword() bool
- func (t TokenType) IsDataType() bool
- func (t TokenType) IsJoinKeyword() bool
- func (t TokenType) IsKeyword() bool
- func (t TokenType) IsLiteral() bool
- func (t TokenType) IsOperator() bool
- func (t TokenType) IsSetOperation() bool
- func (t TokenType) IsWindowKeyword() bool
- func (t TokenType) String() string
- type TokenWithSpan
- type TokenizerError
- type Whitespace
- type WhitespaceType
- type Word
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Keyword ¶
type Keyword struct {
Word string // The actual keyword text
Reserved bool // Whether this is a reserved keyword
}
Keyword represents a lexical keyword with its properties.
Keywords are SQL reserved words or dialect-specific keywords that have special meaning in SQL syntax. GoSQLX supports keywords from multiple SQL dialects: PostgreSQL, MySQL, SQL Server, Oracle, and SQLite.
Fields:
- Word: The keyword text in uppercase (canonical form)
- Reserved: True if this is a reserved keyword that cannot be used as an identifier
Example:
// Reserved keyword
kw := &models.Keyword{Word: "SELECT", Reserved: true}
// Non-reserved keyword
kw := &models.Keyword{Word: "RETURNING", Reserved: false}
v1.6.0 adds support for PostgreSQL-specific keywords:
- LATERAL: Correlated subqueries in FROM clause
- RETURNING: Return modified rows from INSERT/UPDATE/DELETE
- FILTER: Conditional aggregation in window functions
type Location ¶
Location represents a position in the source code using 1-based indexing.
Location is used throughout GoSQLX for precise error reporting and IDE integration. Both Line and Column use 1-based indexing to match SQL standards and editor conventions.
Fields:
- Line: Line number in source code (starts at 1)
- Column: Column number within the line (starts at 1)
Example:
loc := models.Location{Line: 5, Column: 20}
// Represents position: line 5, column 20 (5th line, 20th character)
Usage in error reporting:
err := errors.NewError(
errors.ErrCodeUnexpectedToken,
"unexpected token",
models.Location{Line: 1, Column: 15},
)
Integration with LSP (Language Server Protocol):
// Convert to LSP Position (0-based)
lspPos := lsp.Position{
Line: location.Line - 1, // Convert to 0-based
Character: location.Column - 1, // Convert to 0-based
}
Performance: Location is a lightweight value type (2 ints) that is stack-allocated and has no memory overhead.
type Span ¶
type Span struct {
Start Location // Start of the span (inclusive)
End Location // End of the span (exclusive)
}
Span represents a range in the source code.
Span defines a contiguous region of source code from a Start location to an End location. Used for highlighting ranges in error messages, LSP diagnostics, and code formatting.
Fields:
- Start: Beginning location of the span (inclusive)
- End: Ending location of the span (exclusive)
Example:
span := models.Span{
Start: models.Location{Line: 1, Column: 1},
End: models.Location{Line: 1, Column: 7},
}
// Represents "SELECT" token spanning columns 1-6 on line 1
Usage with TokenWithSpan:
token := models.TokenWithSpan{
Token: models.Token{Type: models.TokenTypeSelect, Value: "SELECT"},
Start: models.Location{Line: 1, Column: 1},
End: models.Location{Line: 1, Column: 7},
}
Helper functions:
span := models.NewSpan(startLoc, endLoc) // Create new span emptySpan := models.EmptySpan() // Create empty span
func EmptySpan ¶
func EmptySpan() Span
EmptySpan returns an empty span with zero values.
Used as a default/placeholder when span information is not available.
Example:
span := models.EmptySpan()
// Equivalent to: Span{Start: Location{}, End: Location{}}
func NewSpan ¶
NewSpan creates a new span from start to end locations.
Parameters:
- start: Beginning location (inclusive)
- end: Ending location (exclusive)
Returns a Span covering the range [start, end).
Example:
start := models.Location{Line: 1, Column: 1}
end := models.Location{Line: 1, Column: 7}
span := models.NewSpan(start, end)
type Token ¶
type Token struct {
Type TokenType
Value string
Word *Word // For TokenTypeWord
Long bool // For TokenTypeNumber to indicate if it's a long number
Quote rune // For quoted strings and identifiers
}
Token represents a SQL token with its value and metadata.
Token is the fundamental unit of lexical analysis in GoSQLX. Each token represents a meaningful element in SQL source code: keywords, identifiers, operators, literals, or punctuation.
Tokens are lightweight value types designed for use with object pooling and zero-copy operations. They are immutable and safe for concurrent use.
Fields:
- Type: The token category (keyword, operator, literal, etc.)
- Value: The string representation of the token
- Word: Optional Word struct for keyword/identifier tokens
- Long: Flag for numeric tokens indicating long integer (int64)
- Quote: Quote character used for quoted strings/identifiers (' or ")
Example usage:
token := models.Token{
Type: models.TokenTypeSelect,
Value: "SELECT",
}
// Check token category
if token.Type.IsKeyword() {
fmt.Println("Found SQL keyword:", token.Value)
}
Performance: Tokens are stack-allocated value types with minimal memory overhead. Used extensively with sync.Pool for zero-allocation parsing in hot paths.
func NewToken ¶
NewToken creates a new Token with the given type and value.
Factory function for creating tokens without location information. Useful for testing, manual token construction, or scenarios where position tracking is not needed.
Parameters:
- tokenType: The TokenType classification
- value: The string representation of the token
Returns a Token with the specified type and value.
Example:
token := models.NewToken(models.TokenTypeSelect, "SELECT") // token.Type = TokenTypeSelect, token.Value = "SELECT" numToken := models.NewToken(models.TokenTypeNumber, "42") // numToken.Type = TokenTypeNumber, numToken.Value = "42"
type TokenType ¶
type TokenType int
TokenType represents the type of a SQL token.
TokenType is the core classification system for all lexical units in SQL. GoSQLX v1.6.0 supports 500+ distinct token types organized into logical ranges for efficient categorization and type checking.
Token Type Organization:
- Special (0-9): EOF, UNKNOWN
- Basic (10-29): WORD, NUMBER, IDENTIFIER, PLACEHOLDER
- Strings (30-49): Various string literal formats
- Operators (50-149): Arithmetic, comparison, JSON/JSONB operators
- Keywords (200-499): SQL keywords by category
- Data Types (430-449): SQL data type keywords
v1.6.0 PostgreSQL Extensions:
- JSON/JSONB Operators: ->, ->>, #>, #>>, @>, <@, #-, @?, @@, ?&, ?|
- LATERAL: Correlated subqueries in FROM clause
- RETURNING: Return modified rows from DML statements
- FILTER: Conditional aggregation in window functions
- DISTINCT ON: PostgreSQL-specific row selection
Performance: TokenType is an int with O(1) lookup via range checking. All Is* methods use constant-time comparisons.
Example usage:
// Check token category
if tokenType.IsKeyword() {
// Handle SQL keyword
}
if tokenType.IsOperator() {
// Handle operator (+, -, *, /, ->, etc.)
}
// Check specific categories
if tokenType.IsWindowKeyword() {
// Handle OVER, PARTITION BY, ROWS, RANGE
}
if tokenType.IsDMLKeyword() {
// Handle SELECT, INSERT, UPDATE, DELETE
}
// PostgreSQL JSON operators
switch tokenType {
case TokenTypeArrow: // -> (JSON field access)
case TokenTypeLongArrow: // ->> (JSON field as text)
// Handle JSON operations
}
const ( // TokenRangeBasicStart marks the beginning of basic token types TokenRangeBasicStart TokenType = 10 // TokenRangeBasicEnd marks the end of basic token types (exclusive) TokenRangeBasicEnd TokenType = 30 // TokenRangeStringStart marks the beginning of string literal types TokenRangeStringStart TokenType = 30 // TokenRangeStringEnd marks the end of string literal types (exclusive) TokenRangeStringEnd TokenType = 50 // TokenRangeOperatorStart marks the beginning of operator types TokenRangeOperatorStart TokenType = 50 // TokenRangeOperatorEnd marks the end of operator types (exclusive) TokenRangeOperatorEnd TokenType = 150 // TokenRangeKeywordStart marks the beginning of SQL keyword types TokenRangeKeywordStart TokenType = 200 // TokenRangeKeywordEnd marks the end of SQL keyword types (exclusive) TokenRangeKeywordEnd TokenType = 500 // TokenRangeDataTypeStart marks the beginning of data type keywords TokenRangeDataTypeStart TokenType = 430 // TokenRangeDataTypeEnd marks the end of data type keywords (exclusive) TokenRangeDataTypeEnd TokenType = 450 )
Token range constants for maintainability and clarity. These define the boundaries for each category of tokens.
const ( // Special tokens TokenTypeEOF TokenType = 0 TokenTypeUnknown TokenType = 1 // Basic token types (10-29) TokenTypeWord TokenType = 10 TokenTypeNumber TokenType = 11 TokenTypeChar TokenType = 12 TokenTypeWhitespace TokenType = 13 TokenTypeIdentifier TokenType = 14 TokenTypePlaceholder TokenType = 15 // String literals (30-49) TokenTypeString TokenType = 30 // Generic string type TokenTypeSingleQuotedString TokenType = 31 TokenTypeDoubleQuotedString TokenType = 32 TokenTypeTripleSingleQuotedString TokenType = 33 TokenTypeTripleDoubleQuotedString TokenType = 34 TokenTypeDollarQuotedString TokenType = 35 TokenTypeByteStringLiteral TokenType = 36 TokenTypeNationalStringLiteral TokenType = 37 TokenTypeEscapedStringLiteral TokenType = 38 TokenTypeUnicodeStringLiteral TokenType = 39 TokenTypeHexStringLiteral TokenType = 40 // Operators and punctuation (50-99) TokenTypeOperator TokenType = 50 // Generic operator TokenTypeComma TokenType = 51 TokenTypeEq TokenType = 52 TokenTypeDoubleEq TokenType = 53 TokenTypeNeq TokenType = 54 TokenTypeLt TokenType = 55 TokenTypeGt TokenType = 56 TokenTypeLtEq TokenType = 57 TokenTypeGtEq TokenType = 58 TokenTypeSpaceship TokenType = 59 TokenTypePlus TokenType = 60 TokenTypeMinus TokenType = 61 TokenTypeMul TokenType = 62 TokenTypeDiv TokenType = 63 TokenTypeDuckIntDiv TokenType = 64 TokenTypeMod TokenType = 65 TokenTypeStringConcat TokenType = 66 TokenTypeLParen TokenType = 67 TokenTypeLeftParen TokenType = 67 // Alias for compatibility TokenTypeRParen TokenType = 68 TokenTypeRightParen TokenType = 68 // Alias for compatibility TokenTypePeriod TokenType = 69 TokenTypeDot TokenType = 69 // Alias for compatibility TokenTypeColon TokenType = 70 TokenTypeDoubleColon TokenType = 71 TokenTypeAssignment TokenType = 72 TokenTypeSemicolon TokenType = 73 TokenTypeBackslash TokenType = 74 TokenTypeLBracket TokenType = 75 TokenTypeRBracket TokenType = 76 TokenTypeAmpersand TokenType = 77 TokenTypePipe TokenType = 78 TokenTypeCaret TokenType = 79 TokenTypeLBrace TokenType = 80 TokenTypeRBrace TokenType = 81 TokenTypeRArrow TokenType = 82 TokenTypeSharp TokenType = 83 TokenTypeTilde TokenType = 84 TokenTypeExclamationMark TokenType = 85 TokenTypeAtSign TokenType = 86 TokenTypeQuestion TokenType = 87 // Compound operators (100-149) TokenTypeTildeAsterisk TokenType = 100 TokenTypeExclamationMarkTilde TokenType = 101 TokenTypeExclamationMarkTildeAsterisk TokenType = 102 TokenTypeDoubleTilde TokenType = 103 TokenTypeDoubleTildeAsterisk TokenType = 104 TokenTypeExclamationMarkDoubleTilde TokenType = 105 TokenTypeExclamationMarkDoubleTildeAsterisk TokenType = 106 TokenTypeShiftLeft TokenType = 107 TokenTypeShiftRight TokenType = 108 TokenTypeOverlap TokenType = 109 TokenTypeDoubleExclamationMark TokenType = 110 TokenTypeCaretAt TokenType = 111 TokenTypePGSquareRoot TokenType = 112 TokenTypePGCubeRoot TokenType = 113 // JSON/JSONB operators (PostgreSQL) TokenTypeArrow TokenType = 114 // -> JSON field access (returns JSON) TokenTypeLongArrow TokenType = 115 // ->> JSON field access (returns text) TokenTypeHashArrow TokenType = 116 // #> JSON path access (returns JSON) TokenTypeHashLongArrow TokenType = 117 // #>> JSON path access (returns text) TokenTypeAtArrow TokenType = 118 // @> JSON contains TokenTypeArrowAt TokenType = 119 // <@ JSON is contained by TokenTypeHashMinus TokenType = 120 // #- Delete at JSON path TokenTypeAtQuestion TokenType = 121 // @? JSON path query TokenTypeAtAt TokenType = 122 // @@ Full text search TokenTypeQuestionAnd TokenType = 123 // ?& JSON key exists all TokenTypeQuestionPipe TokenType = 124 // ?| JSON key exists any TokenTypeCustomBinaryOperator TokenType = 125 // SQL Keywords (200-399) TokenTypeKeyword TokenType = 200 // Generic keyword TokenTypeSelect TokenType = 201 TokenTypeFrom TokenType = 202 TokenTypeWhere TokenType = 203 TokenTypeJoin TokenType = 204 TokenTypeInner TokenType = 205 TokenTypeLeft TokenType = 206 TokenTypeRight TokenType = 207 TokenTypeOuter TokenType = 208 TokenTypeOn TokenType = 209 TokenTypeAs TokenType = 210 TokenTypeAnd TokenType = 211 TokenTypeOr TokenType = 212 TokenTypeNot TokenType = 213 TokenTypeIn TokenType = 214 TokenTypeLike TokenType = 215 TokenTypeBetween TokenType = 216 TokenTypeIs TokenType = 217 TokenTypeNull TokenType = 218 TokenTypeTrue TokenType = 219 TokenTypeFalse TokenType = 220 TokenTypeCase TokenType = 221 TokenTypeWhen TokenType = 222 TokenTypeThen TokenType = 223 TokenTypeElse TokenType = 224 TokenTypeEnd TokenType = 225 TokenTypeGroup TokenType = 226 TokenTypeBy TokenType = 227 TokenTypeHaving TokenType = 228 TokenTypeOrder TokenType = 229 TokenTypeAsc TokenType = 230 TokenTypeDesc TokenType = 231 TokenTypeLimit TokenType = 232 TokenTypeOffset TokenType = 233 // DML Keywords (234-239) TokenTypeInsert TokenType = 234 TokenTypeUpdate TokenType = 235 TokenTypeDelete TokenType = 236 TokenTypeInto TokenType = 237 TokenTypeValues TokenType = 238 TokenTypeSet TokenType = 239 // DDL Keywords (240-249) TokenTypeCreate TokenType = 240 TokenTypeAlter TokenType = 241 TokenTypeDrop TokenType = 242 TokenTypeTable TokenType = 243 TokenTypeIndex TokenType = 244 TokenTypeView TokenType = 245 TokenTypeColumn TokenType = 246 TokenTypeDatabase TokenType = 247 TokenTypeSchema TokenType = 248 TokenTypeTrigger TokenType = 249 // Aggregate functions (250-269) TokenTypeCount TokenType = 250 TokenTypeSum TokenType = 251 TokenTypeAvg TokenType = 252 TokenTypeMin TokenType = 253 TokenTypeMax TokenType = 254 // Compound keywords (270-279) TokenTypeGroupBy TokenType = 270 TokenTypeOrderBy TokenType = 271 TokenTypeLeftJoin TokenType = 272 TokenTypeRightJoin TokenType = 273 TokenTypeInnerJoin TokenType = 274 TokenTypeOuterJoin TokenType = 275 TokenTypeFullJoin TokenType = 276 TokenTypeCrossJoin TokenType = 277 // CTE and Set Operations (280-299) TokenTypeWith TokenType = 280 TokenTypeRecursive TokenType = 281 TokenTypeUnion TokenType = 282 TokenTypeExcept TokenType = 283 TokenTypeIntersect TokenType = 284 TokenTypeAll TokenType = 285 // Window Function Keywords (300-319) TokenTypeOver TokenType = 300 TokenTypePartition TokenType = 301 TokenTypeRows TokenType = 302 TokenTypeRange TokenType = 303 TokenTypeUnbounded TokenType = 304 TokenTypePreceding TokenType = 305 TokenTypeFollowing TokenType = 306 TokenTypeCurrent TokenType = 307 TokenTypeRow TokenType = 308 TokenTypeGroups TokenType = 309 TokenTypeFilter TokenType = 310 TokenTypeExclude TokenType = 311 // Additional Join Keywords (320-329) TokenTypeCross TokenType = 320 TokenTypeNatural TokenType = 321 TokenTypeFull TokenType = 322 TokenTypeUsing TokenType = 323 TokenTypeLateral TokenType = 324 // LATERAL keyword for correlated subqueries in FROM clause // Constraint Keywords (330-349) TokenTypePrimary TokenType = 330 TokenTypeKey TokenType = 331 TokenTypeForeign TokenType = 332 TokenTypeReferences TokenType = 333 TokenTypeUnique TokenType = 334 TokenTypeCheck TokenType = 335 TokenTypeDefault TokenType = 336 TokenTypeAutoIncrement TokenType = 337 TokenTypeConstraint TokenType = 338 TokenTypeNotNull TokenType = 339 TokenTypeNullable TokenType = 340 // Additional SQL Keywords (350-399) TokenTypeDistinct TokenType = 350 TokenTypeExists TokenType = 351 TokenTypeAny TokenType = 352 TokenTypeSome TokenType = 353 TokenTypeCast TokenType = 354 TokenTypeConvert TokenType = 355 TokenTypeCollate TokenType = 356 TokenTypeCascade TokenType = 357 TokenTypeRestrict TokenType = 358 TokenTypeReplace TokenType = 359 TokenTypeRename TokenType = 360 TokenTypeTo TokenType = 361 TokenTypeIf TokenType = 362 TokenTypeOnly TokenType = 363 TokenTypeFor TokenType = 364 TokenTypeNulls TokenType = 365 TokenTypeFirst TokenType = 366 TokenTypeLast TokenType = 367 TokenTypeFetch TokenType = 368 // FETCH keyword for FETCH FIRST/NEXT clause TokenTypeNext TokenType = 369 // NEXT keyword for FETCH NEXT clause // MERGE Statement Keywords (370-379) TokenTypeMerge TokenType = 370 TokenTypeMatched TokenType = 371 TokenTypeTarget TokenType = 372 TokenTypeSource TokenType = 373 // Materialized View Keywords (374-379) TokenTypeMaterialized TokenType = 374 TokenTypeRefresh TokenType = 375 TokenTypeTies TokenType = 376 // TIES keyword for WITH TIES in FETCH clause TokenTypePercent TokenType = 377 // PERCENT keyword for FETCH ... PERCENT ROWS TokenTypeTruncate TokenType = 378 // TRUNCATE keyword for TRUNCATE TABLE statement TokenTypeReturning TokenType = 379 // RETURNING keyword for PostgreSQL RETURNING clause // Row Locking Keywords (380-389) TokenTypeNoWait TokenType = 381 // NOWAIT keyword for FOR UPDATE/SHARE NOWAIT TokenTypeSkip TokenType = 382 // SKIP keyword for FOR UPDATE SKIP LOCKED TokenTypeLocked TokenType = 383 // LOCKED keyword for SKIP LOCKED TokenTypeOf TokenType = 384 // OF keyword for FOR UPDATE OF table_name // Grouping Set Keywords (390-399) TokenTypeGroupingSets TokenType = 390 TokenTypeRollup TokenType = 391 TokenTypeCube TokenType = 392 TokenTypeGrouping TokenType = 393 TokenTypeSets TokenType = 394 // SETS keyword for GROUPING SETS TokenTypeArray TokenType = 395 // ARRAY keyword for PostgreSQL array constructor TokenTypeWithin TokenType = 396 // WITHIN keyword for WITHIN GROUP clause // Role/Permission Keywords (400-419) TokenTypeRole TokenType = 400 TokenTypeUser TokenType = 401 TokenTypeGrant TokenType = 402 TokenTypeRevoke TokenType = 403 TokenTypePrivilege TokenType = 404 TokenTypePassword TokenType = 405 TokenTypeLogin TokenType = 406 TokenTypeSuperuser TokenType = 407 TokenTypeCreateDB TokenType = 408 TokenTypeCreateRole TokenType = 409 // Transaction Keywords (420-429) TokenTypeBegin TokenType = 420 TokenTypeCommit TokenType = 421 TokenTypeRollback TokenType = 422 TokenTypeSavepoint TokenType = 423 // Data Type Keywords (430-449) TokenTypeInt TokenType = 430 TokenTypeInteger TokenType = 431 TokenTypeBigInt TokenType = 432 TokenTypeSmallInt TokenType = 433 TokenTypeFloat TokenType = 434 TokenTypeDouble TokenType = 435 TokenTypeDecimal TokenType = 436 TokenTypeNumeric TokenType = 437 TokenTypeVarchar TokenType = 438 TokenTypeCharDataType TokenType = 439 // Char as data type (TokenTypeChar=12 is for single char token) TokenTypeText TokenType = 440 TokenTypeBoolean TokenType = 441 TokenTypeDate TokenType = 442 TokenTypeTime TokenType = 443 TokenTypeTimestamp TokenType = 444 TokenTypeInterval TokenType = 445 TokenTypeBlob TokenType = 446 TokenTypeClob TokenType = 447 TokenTypeJson TokenType = 448 TokenTypeUuid TokenType = 449 // Special Token Types (500-509) TokenTypeIllegal TokenType = 500 // For parser compatibility with token.ILLEGAL TokenTypeAsterisk TokenType = 501 // Explicit asterisk token type TokenTypeDoublePipe TokenType = 502 // || concatenation operator )
Token type constants with explicit values to avoid collisions
func (TokenType) IsAggregateFunction ¶ added in v1.6.0
IsAggregateFunction returns true if the token type is an aggregate function
func (TokenType) IsConstraint ¶ added in v1.6.0
IsConstraint returns true if the token type is a constraint keyword
func (TokenType) IsDDLKeyword ¶ added in v1.6.0
IsDDLKeyword returns true if the token type is a DDL keyword
func (TokenType) IsDMLKeyword ¶ added in v1.6.0
IsDMLKeyword returns true if the token type is a DML keyword
func (TokenType) IsDataType ¶ added in v1.6.0
IsDataType returns true if the token type is a SQL data type. Uses range-based checking for O(1) performance.
Example:
if token.ModelType.IsDataType() {
// Handle data type token (INT, VARCHAR, BOOLEAN, etc.)
}
func (TokenType) IsJoinKeyword ¶ added in v1.6.0
IsJoinKeyword returns true if the token type is a JOIN-related keyword
func (TokenType) IsKeyword ¶ added in v1.6.0
IsKeyword returns true if the token type is a SQL keyword. Uses range-based checking for O(1) performance (~0.24ns/op).
Example:
if token.ModelType.IsKeyword() {
// Handle SQL keyword token
}
func (TokenType) IsLiteral ¶ added in v1.6.0
IsLiteral returns true if the token type is a literal value. Includes identifiers, numbers, strings, and boolean/null literals.
Example:
if token.ModelType.IsLiteral() {
// Handle literal value (identifier, number, string, true/false/null)
}
func (TokenType) IsOperator ¶ added in v1.6.0
IsOperator returns true if the token type is an operator. Uses range-based checking for O(1) performance.
Example:
if token.ModelType.IsOperator() {
// Handle operator token (e.g., +, -, *, /, etc.)
}
func (TokenType) IsSetOperation ¶ added in v1.6.0
IsSetOperation returns true if the token type is a set operation
func (TokenType) IsWindowKeyword ¶ added in v1.6.0
IsWindowKeyword returns true if the token type is a window function keyword
func (TokenType) String ¶ added in v1.0.1
String returns a string representation of the token type.
Provides human-readable names for debugging, error messages, and logging. Uses O(1) map lookup for fast conversion.
Example:
tokenType := models.TokenTypeSelect fmt.Println(tokenType.String()) // Output: "SELECT" tokenType = models.TokenTypeLongArrow fmt.Println(tokenType.String()) // Output: "LONG_ARROW"
type TokenWithSpan ¶
type TokenWithSpan struct {
Token Token // The token with type and value
Start Location // Start position (inclusive)
End Location // End position (exclusive)
}
TokenWithSpan represents a token with its location in the source code.
TokenWithSpan combines a Token with precise position information (Start and End locations). This is the primary representation used by the tokenizer output and consumed by the parser.
Fields:
- Token: The token itself (type, value, metadata)
- Start: Beginning location of the token in source (inclusive)
- End: Ending location of the token in source (exclusive)
Example:
// Token for "SELECT" at line 1, columns 1-7
tokenWithSpan := models.TokenWithSpan{
Token: models.Token{Type: models.TokenTypeSelect, Value: "SELECT"},
Start: models.Location{Line: 1, Column: 1},
End: models.Location{Line: 1, Column: 7},
}
Usage with tokenizer:
tkz := tokenizer.GetTokenizer()
defer tokenizer.PutTokenizer(tkz)
tokens, err := tkz.Tokenize([]byte(sql))
// tokens is []TokenWithSpan with location information
for _, t := range tokens {
fmt.Printf("Token %s at line %d, column %d\n",
t.Token.Value, t.Start.Line, t.Start.Column)
}
Used for error reporting:
// Create error at token location
err := errors.NewError(
errors.ErrCodeUnexpectedToken,
"unexpected token",
tokenWithSpan.Start,
)
Performance: TokenWithSpan is a value type designed for zero-copy operations. The tokenizer returns slices of TokenWithSpan without heap allocations.
func NewEOFToken ¶
func NewEOFToken(pos Location) TokenWithSpan
NewEOFToken creates a new EOF token with span.
Factory function for creating End-Of-File tokens. EOF tokens mark the end of the input stream and are essential for parser termination.
Parameters:
- pos: The location where EOF was encountered
Returns a TokenWithSpan with type TokenTypeEOF and empty value. Both Start and End are set to the same position.
Example:
eofToken := models.NewEOFToken(models.Location{Line: 10, Column: 1})
// eofToken.Token.Type = TokenTypeEOF
// eofToken.Token.Value = ""
// eofToken.Start = eofToken.End = {Line: 10, Column: 1}
Used by tokenizer at end of input:
tokens = append(tokens, models.NewEOFToken(currentLocation))
func NewTokenWithSpan ¶
func NewTokenWithSpan(tokenType TokenType, value string, start, end Location) TokenWithSpan
NewTokenWithSpan creates a new TokenWithSpan with the given type, value, and location.
Factory function for creating tokens with precise position information. This is the primary way to create tokens during tokenization.
Parameters:
- tokenType: The TokenType classification
- value: The string representation of the token
- start: Beginning location in source (inclusive)
- end: Ending location in source (exclusive)
Returns a TokenWithSpan with all fields populated.
Example:
token := models.NewTokenWithSpan(
models.TokenTypeSelect,
"SELECT",
models.Location{Line: 1, Column: 1},
models.Location{Line: 1, Column: 7},
)
// Represents "SELECT" spanning columns 1-6 on line 1
Used by tokenizer:
tokens = append(tokens, models.NewTokenWithSpan(
tokenType, value, startLoc, endLoc,
))
func TokenAtLocation ¶
func TokenAtLocation(token Token, start, end Location) TokenWithSpan
TokenAtLocation creates a new TokenWithSpan from a Token and location.
Convenience function for adding location information to an existing Token. Useful when token is created first and location is determined later.
Parameters:
- token: The Token to wrap with location
- start: Beginning location in source (inclusive)
- end: Ending location in source (exclusive)
Returns a TokenWithSpan combining the token and location.
Example:
token := models.NewToken(models.TokenTypeSelect, "SELECT")
start := models.Location{Line: 1, Column: 1}
end := models.Location{Line: 1, Column: 7}
tokenWithSpan := models.TokenAtLocation(token, start, end)
func WrapToken ¶
func WrapToken(token Token) TokenWithSpan
WrapToken wraps a token with an empty location.
Creates a TokenWithSpan from a Token when location information is not available or not needed. The Start and End locations are set to zero values.
Example:
token := models.Token{Type: models.TokenTypeSelect, Value: "SELECT"}
wrapped := models.WrapToken(token)
// wrapped.Start and wrapped.End are both Location{Line: 0, Column: 0}
Use case: Testing or scenarios where location tracking is not required.
type TokenizerError ¶
type TokenizerError struct {
Message string // Error description
Location Location // Where the error occurred
}
TokenizerError represents an error during tokenization.
TokenizerError is a simple error type for lexical analysis failures. It includes the error message and the precise location where the error occurred.
For more sophisticated error handling with hints, suggestions, and context, use the errors package (pkg/errors) which provides structured errors with:
- Error codes (E1xxx for tokenizer errors)
- SQL context extraction and highlighting
- Intelligent suggestions and typo detection
- Help URLs for documentation
Fields:
- Message: Human-readable error description
- Location: Precise position in source where error occurred (line/column)
Example:
err := models.TokenizerError{
Message: "unexpected character '@' at position",
Location: models.Location{Line: 2, Column: 15},
}
fmt.Println(err.Error()) // "unexpected character '@' at position"
Upgrading to structured errors:
// Instead of TokenizerError, use errors package:
err := errors.UnexpectedCharError('@', location, sqlSource)
// Provides: error code, context, hints, help URL
Common tokenizer errors:
- Unexpected characters in input
- Unterminated string literals
- Invalid numeric formats
- Invalid identifier syntax
- Input size limits exceeded (DoS protection)
Performance: TokenizerError is a lightweight value type with minimal overhead.
func (TokenizerError) Error ¶
func (e TokenizerError) Error() string
Error implements the error interface.
Returns the error message. For full context and location information, use the errors package which provides FormatErrorWithContext.
Example:
err := models.TokenizerError{Message: "invalid token", Location: loc}
fmt.Println(err.Error()) // Output: "invalid token"
type Whitespace ¶
type Whitespace struct {
Type WhitespaceType
Content string // For comments
Prefix string // For single line comments
}
Whitespace represents different types of whitespace tokens.
Whitespace tokens are typically ignored during parsing but can be preserved for formatting tools, SQL formatters, or LSP servers that need to maintain original source formatting and comments.
Fields:
- Type: The specific type of whitespace (space, newline, tab, comment)
- Content: The actual content (used for comments to preserve text)
- Prefix: Comment prefix for single-line comments (-- or # in MySQL)
Example:
// Single-line comment
ws := models.Whitespace{
Type: models.WhitespaceTypeSingleLineComment,
Content: "This is a comment",
Prefix: "--",
}
// Multi-line comment
ws := models.Whitespace{
Type: models.WhitespaceTypeMultiLineComment,
Content: "/* Block comment */",
}
type WhitespaceType ¶
type WhitespaceType int
WhitespaceType represents the type of whitespace.
Used to distinguish between different whitespace and comment types in SQL source code for accurate formatting and comment preservation.
const ( WhitespaceTypeSpace WhitespaceType = iota // Regular space character WhitespaceTypeNewline // Line break (\n or \r\n) WhitespaceTypeTab // Tab character (\t) WhitespaceTypeSingleLineComment // Single-line comment (-- or #) WhitespaceTypeMultiLineComment // Multi-line comment (/* ... */) )
type Word ¶
type Word struct {
Value string // The actual text value
QuoteStyle rune // The quote character used (if quoted)
Keyword *Keyword // If this word is a keyword
}
Word represents a keyword or identifier with its properties.
Word is used to distinguish between different types of word tokens: SQL keywords (SELECT, FROM, WHERE), identifiers (table/column names), and quoted identifiers ("column name" or [column name]).
Fields:
- Value: The actual text of the word (case-preserved)
- QuoteStyle: The quote character if this is a quoted identifier (", `, [, etc.)
- Keyword: Pointer to Keyword struct if this word is a SQL keyword (nil for identifiers)
Example:
// SQL keyword
word := &models.Word{
Value: "SELECT",
Keyword: &models.Keyword{Word: "SELECT", Reserved: true},
}
// Quoted identifier
word := &models.Word{
Value: "column name",
QuoteStyle: '"',
}