Documentation
¶
Overview ¶
Package string ports cpython/Parser/string_parser.c. The parser hands a string token (with surrounding quotes and any b/r/u prefix still attached) to ParseString, which returns the decoded payload plus the prefix flags the AST builder needs.
Escape decoding mirrors pycore_unicodeobject.c _PyUnicode_DecodeUnicodeEscapeInternal and pycore_bytesobject.c _PyBytes_DecodeEscape. The full table is implemented here rather than re-exported from the runtime so the parser stays self-contained.
CPython: Parser/string_parser.c
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CharByName ¶
CharByName returns the rune for one of the common Unicode names the test suite exercises. Names are upper-case, hyphens and spaces in the canonical form.
CPython: Modules/_unicodedata.c ucd_lookup
Types ¶
type Result ¶
type Result struct {
Text string
Bytes []byte
IsBytes bool
IsRaw bool
// IsFString is set on the per-segment result the f-string
// scanner emits, and on the folded result when at least one
// part was an f-string. IsTString is the matching flag for
// t-strings. Concat uses these to enforce the
// no-implicit-mixing rule in CPython 3.14.
IsFString bool
IsTString bool
// Warnings carries SyntaxWarning text (one per unknown escape)
// that the caller should surface separately. Empty when the
// literal contained no flagged escapes.
//
// CPython: Objects/unicodeobject.c emits PyExc_SyntaxWarning
// from _PyUnicode_DecodeUnicodeEscapeInternal.
Warnings []string
}
Result is the decoded payload of a single string token. Bytes is set when IsBytes is true; otherwise Text holds the decoded unicode body.
CPython: Parser/string_parser.h:11 result type (folded into one shape on the Go side)
func Concat ¶
Concat folds a non-empty slice of literal results emitted by adjacent string tokens. The bytes/str sides cannot mix; if they do, Concat returns the CPython SyntaxError text "cannot mix bytes and nonbytes literals".
CPython: Parser/string_parser.c:21 _PyPegen_concatenate_strings
func ParseString ¶
ParseString decodes a single string-literal token. tok is the raw bytes the lexer emitted, including prefix and quotes.
CPython: Parser/string_parser.c:253 _PyPegen_parse_string
type Segment ¶
type Segment struct {
Kind SegKind
Literal string
ExprText string
Conversion byte // 0 when absent
FormatSpec string
IsDebug bool
}
Segment is one piece of an f-string body. Literal segments carry the decoded plain text; expression segments carry the raw text inside the braces (without the braces themselves) plus the optional conversion character and format-spec body.
func ScanFString ¶
ScanFString walks body and emits Segments. body is the post-prefix inner text, e.g. for `f'a{x!r:0.2f}'` the input is `a{x!r:0.2f}`.
CPython: Parser/string_parser.c:455 fstring_find_literal_and_field
func ScanTString ¶
ScanTString is the t-string brace scanner. The lexer rules and segment shape are identical to f-strings; the difference is in the AST node the parser builds afterward.
CPython: Parser/string_parser.c (PEP 750 t-string handling)