Documentation
¶
Overview ¶
Package tokenize is the Go port of cpython/Python/Python-tokenize.c. The C file is the Python-visible wrapper around the parser's lexer; it exposes the TokenizerIter class that `tokenize.tokenize()` in the stdlib delegates to.
This file is a skeleton. The lexer state machine (indent tracking, f-string handling, encoding detection) lives in cpython/Parser/tokenizer/* and lands with the parser port. The public surface here is the stable Go-idiomatic shape every consumer can program against today; the implementation under Iter fills in once the lexer skeleton catches up.
CPython: Python/Python-tokenize.c
Package tokenize declares the token kind constants and the public iterator surface.
Numeric values for the constants live in types_gen.go, generated from cpython/Grammar/Tokens via tools/tokens_go. Keeping the type itself in a hand-written file lets other files in the package depend on `Type` without requiring the generator to have run yet.
CPython: Include/internal/pycore_token.h Token enum
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Iter ¶
type Iter struct {
// State will hold the parser-side lexer once the parser port
// lands. Until then Next returns io.EOF immediately. Exported so
// the unused-field linter doesn't flip on the placeholder.
State any
}
Iter is the Go-side TokenizerIter equivalent. Next advances the underlying lexer state by one token; EOF is reported as io.EOF.
CPython: Python/Python-tokenize.c tokenizeriterobject
func New ¶
New constructs an Iter over a source string. extraTokens enables the COMMENT / NL / ENCODING / NEWLINE-at-EOF tokens that the stdlib filters out by default.
CPython: Python/Python-tokenize.c tokenizeriter_new (source path)
func NewReadline ¶
NewReadline constructs an Iter that pulls source lines from a readline-shaped callable, the same shape io.TextIO.readline has on the Python side.
CPython: Python/Python-tokenize.c tokenizeriter_new (readline path)
type Pos ¶
Pos is the (line, column) source position of a token boundary. Both fields are 1-based, matching CPython's tokenize.TokenInfo.
CPython: Python/Python-tokenize.c tokenizeriter_next
type Token ¶
Token is one record emitted by the iterator. Mirrors the 5-tuple (type, string, start, end, line) the C wrapper returns.
CPython: Python/Python-tokenize.c tokenizeriter_next
type Type ¶
type Type int
Type is the token kind. Numeric values match CPython 3.14 Grammar/Tokens one for one. The full constant set is in types_gen.go.
const ( ENDMARKER Type = 0 NAME Type = 1 NUMBER Type = 2 STRING Type = 3 NEWLINE Type = 4 INDENT Type = 5 DEDENT Type = 6 LPAR Type = 7 RPAR Type = 8 LSQB Type = 9 RSQB Type = 10 COLON Type = 11 COMMA Type = 12 SEMI Type = 13 PLUS Type = 14 MINUS Type = 15 STAR Type = 16 SLASH Type = 17 VBAR Type = 18 AMPER Type = 19 LESS Type = 20 GREATER Type = 21 EQUAL Type = 22 DOT Type = 23 PERCENT Type = 24 LBRACE Type = 25 RBRACE Type = 26 EQEQUAL Type = 27 NOTEQUAL Type = 28 LESSEQUAL Type = 29 GREATEREQUAL Type = 30 TILDE Type = 31 CIRCUMFLEX Type = 32 LEFTSHIFT Type = 33 RIGHTSHIFT Type = 34 DOUBLESTAR Type = 35 PLUSEQUAL Type = 36 MINEQUAL Type = 37 STAREQUAL Type = 38 SLASHEQUAL Type = 39 PERCENTEQUAL Type = 40 AMPEREQUAL Type = 41 VBAREQUAL Type = 42 CIRCUMFLEXEQUAL Type = 43 LEFTSHIFTEQUAL Type = 44 RIGHTSHIFTEQUAL Type = 45 DOUBLESTAREQUAL Type = 46 DOUBLESLASH Type = 47 DOUBLESLASHEQUAL Type = 48 AT Type = 49 ATEQUAL Type = 50 RARROW Type = 51 ELLIPSIS Type = 52 COLONEQUAL Type = 53 EXCLAMATION Type = 54 OP Type = 55 TYPE_IGNORE Type = 56 TYPE_COMMENT Type = 57 SOFT_KEYWORD Type = 58 FSTRING_START Type = 59 FSTRING_MIDDLE Type = 60 FSTRING_END Type = 61 TSTRING_START Type = 62 TSTRING_MIDDLE Type = 63 TSTRING_END Type = 64 COMMENT Type = 65 NL Type = 66 ERRORTOKEN Type = 67 ENCODING Type = 68 NTokens Type = 69 )
Token kinds, numeric values pinned to CPython's token.h. The ALL_CAPS spellings preserve parity with `token.tok_name` so fixture comparisons line up byte-for-byte.