tokenize

package
v0.12.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 15, 2026 License: Apache-2.0 Imports: 3 Imported by: 0

Documentation

Overview

Package tokenize is the Go port of cpython/Python/Python-tokenize.c. The C file is the Python-visible wrapper around the parser's lexer; it exposes the TokenizerIter class that `tokenize.tokenize()` in the stdlib delegates to.

The token kind constants live in the sibling token package, mirroring CPython's split between Include/internal/pycore_token.h (consumed by the C tokenizer) and Lib/token.py (re-exported by Lib/tokenize.py).

CPython: Python/Python-tokenize.c

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Iter

type Iter struct {
	// contains filtered or unexported fields
}

Iter is the Go-side TokenizerIter equivalent. Next advances the underlying lexer state by one token; EOF is reported as io.EOF.

CPython: Python/Python-tokenize.c tokenizeriterobject

func New

func New(src string, extraTokens bool) *Iter

New constructs an Iter over a source string. extraTokens enables the COMMENT / NL / ENCODING / NEWLINE-at-EOF tokens that the stdlib filters out by default.

CPython: Python/Python-tokenize.c tokenizeriter_new (source path)

func NewReadline

func NewReadline(rl func() (string, error), extraTokens bool) *Iter

NewReadline constructs an Iter that pulls source lines from a readline-shaped callable, the same shape io.TextIO.readline has on the Python side. The callback returns one line of source (including any trailing newline) or io.EOF at end of stream.

CPython: Python/Python-tokenize.c tokenizeriter_new (readline path)

func (*Iter) Next

func (it *Iter) Next() (Token, error)

Next returns the next token. Returns io.EOF after the lexer's ENDMARKER has been delivered, matching the Python iterator protocol's StopIteration translation.

CPython: Python/Python-tokenize.c tokenizeriter_next

type Pos

type Pos struct {
	Line int
	Col  int
}

Pos is the (line, column) source position of a token boundary. Line is 1-based and Col is 0-based, matching CPython's tokenize.TokenInfo.

CPython: Python/Python-tokenize.c tokenizeriter_next

type Token

type Token struct {
	Type  token.Type
	Value string
	Start Pos
	End   Pos
	Line  string
}

Token is one record emitted by the iterator. Mirrors the 5-tuple (type, string, start, end, line) the C wrapper returns.

CPython: Python/Python-tokenize.c tokenizeriter_next

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL