string

package
v0.10.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 6, 2026 License: Apache-2.0 Imports: 4 Imported by: 0

Documentation

Overview

Package string ports cpython/Parser/string_parser.c. The parser hands a string token (with surrounding quotes and any b/r/u prefix still attached) to ParseString, which returns the decoded payload plus the prefix flags the AST builder needs.

Escape decoding mirrors pycore_unicodeobject.c _PyUnicode_DecodeUnicodeEscapeInternal and pycore_bytesobject.c _PyBytes_DecodeEscape. The full table is implemented here rather than re-exported from the runtime so the parser stays self-contained.

CPython: Parser/string_parser.c

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CharByName

func CharByName(name string) (rune, error)

CharByName returns the rune for one of the common Unicode names the test suite exercises. Names are upper-case, hyphens and spaces in the canonical form.

CPython: Modules/_unicodedata.c ucd_lookup

Types

type Result

type Result struct {
	Text    string
	Bytes   []byte
	IsBytes bool
	IsRaw   bool
	// IsFString is set on the per-segment result the f-string
	// scanner emits, and on the folded result when at least one
	// part was an f-string. IsTString is the matching flag for
	// t-strings. Concat uses these to enforce the
	// no-implicit-mixing rule in CPython 3.14.
	IsFString bool
	IsTString bool
	// Warnings carries SyntaxWarning text (one per unknown escape)
	// that the caller should surface separately. Empty when the
	// literal contained no flagged escapes.
	//
	// CPython: Objects/unicodeobject.c emits PyExc_SyntaxWarning
	// from _PyUnicode_DecodeUnicodeEscapeInternal.
	Warnings []string
}

Result is the decoded payload of a single string token. Bytes is set when IsBytes is true; otherwise Text holds the decoded unicode body.

CPython: Parser/string_parser.h:11 result type (folded into one shape on the Go side)

func Concat

func Concat(parts []Result) (Result, error)

Concat folds a non-empty slice of literal results emitted by adjacent string tokens. The bytes/str sides cannot mix; if they do, Concat returns the CPython SyntaxError text "cannot mix bytes and nonbytes literals".

CPython: Parser/string_parser.c:21 _PyPegen_concatenate_strings

func ParseString

func ParseString(tok []byte) (Result, error)

ParseString decodes a single string-literal token. tok is the raw bytes the lexer emitted, including prefix and quotes.

CPython: Parser/string_parser.c:253 _PyPegen_parse_string

type SegKind

type SegKind int

SegKind tags a Segment.

const (
	SegLiteral SegKind = iota
	SegExpr
)

Segment kinds.

type Segment

type Segment struct {
	Kind       SegKind
	Literal    string
	ExprText   string
	Conversion byte // 0 when absent
	FormatSpec string
	IsDebug    bool
}

Segment is one piece of an f-string body. Literal segments carry the decoded plain text; expression segments carry the raw text inside the braces (without the braces themselves) plus the optional conversion character and format-spec body.

func ScanFString

func ScanFString(body string) ([]Segment, error)

ScanFString walks body and emits Segments. body is the post-prefix inner text, e.g. for `f'a{x!r:0.2f}'` the input is `a{x!r:0.2f}`.

CPython: Parser/string_parser.c:455 fstring_find_literal_and_field

func ScanTString

func ScanTString(body string) ([]Segment, error)

ScanTString is the t-string brace scanner. The lexer rules and segment shape are identical to f-strings; the difference is in the AST node the parser builds afterward.

CPython: Parser/string_parser.c (PEP 750 t-string handling)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL