re

package
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 13, 2026 License: MIT Imports: 7 Imported by: 0

README

re

re provides regular-expression functions for Starlark — a small subset of Python's re module, built on Go's RE2 syntax. It is pure (no filesystem, network, process, or log side effects).

Capability profile: Pure — and legacy/frozen: this module is superseded by the regex module, which returns Python-shaped list results, supports more of the API, and is the one new code should use. re is kept only for backward compatibility; no new features are added here.

Functions

function description
compile(pattern, flags=0) -> regexp Compile pattern into a reusable regexp object exposing match/search/findall/split/sub methods.
search(pattern, string, flags=0) -> list | None Find the first match anywhere in string; return its [start, end] byte-index pair, or None if there is no match.
match(pattern, string, flags=0) -> list Match only at the beginning of string; return [(full_match, group1, ...)] on success, or [] on no match.
split(pattern, string, maxsplit=0, flags=0) -> tuple Split string on pattern. Capture-group text is not included (unlike Python).
findall(pattern, string, flags=0) -> tuple Return all non-overlapping matches; full match text with no groups, the single group's text with one group, or a tuple of group texts with several.
sub(pattern, repl, string, count=0, flags=0) -> string Replace non-overlapping matches of pattern in string with the Go-template repl.

Constants

This module exposes no constants. In particular, no re.* flag constants exist — the flags parameter must be 0 (see Notes).

Types

regexp

A compiled pattern returned by compile. Its Type() string is regexp. It is immutable (frozen) and hashable. Each method mirrors the module-level function of the same name but drops the leading pattern argument.

method signature description
search r.search(string, flags=0) -> list | None As search, against the compiled pattern.
match r.match(string, flags=0) -> list As match, anchored at the start.
findall r.findall(string, flags=0) -> tuple As findall.
split r.split(string, maxsplit=0, flags=0) -> tuple As split.
sub r.sub(repl, string, count=0, flags=0) -> string As sub.

dir() on a compiled object lists exactly ["findall", "match", "search", "split", "sub"].

Details & examples

compile(pattern, flags=0) -> regexp

Compile a pattern once and reuse it. Errors if pattern is not a string, if it is not valid RE2, or if flags is non-zero.

load('re', 'compile')
foo_r = compile("foo")
print(foo_r.findall("foo bar baz"))
# Output:
# ("foo",)
search(pattern, string, flags=0) -> list | None

Scans the whole string for the first match and returns its [start, end] byte indices, or None when nothing matches. Errors if pattern/string is not a string, if pattern is invalid RE2, or if flags is non-zero. Note this returns an index pair, not match text (use findall for text).

load('re', 'compile')
b = compile('b')
print(b.search('abc'))
print(b.search('xyz'))
# Output:
# [1, 2]
# None
match(pattern, string, flags=0) -> list

Matches only at the beginning of the string (Python re.match semantics — a match elsewhere does not count). On success returns a one-element list holding a tuple of the full match followed by each capture group; on failure returns an empty list [] (which is falsy, so if match(...) works as ported from Python). Same error conditions as search.

load('re', 'match')
print(match('world', 'hello world'))
print(match('hello', 'hello world'))
print(match('(h)(e)', 'hello'))
# Output:
# []
# [("hello",)]
# [("he", "h", "e")]
split(pattern, string, maxsplit=0, flags=0) -> tuple

Splits on pattern. maxsplit=0 (default) splits everywhere; a positive maxsplit keeps the remainder as the final element; a negative maxsplit performs no split at all. An astronomically large maxsplit is treated as "no limit". Capture-group text is not inserted into the result (a deliberate difference from Python). Same error conditions as search.

load('re', 'split')
print(split(',', 'a,b,c', 1))
print(split(',', 'a,b,c', maxsplit=2))
print(split(',', 'a,b,c', -1))
# Output:
# ("a", "b,c")
# ("a", "b", "c")
# ("a,b,c",)
findall(pattern, string, flags=0) -> tuple

Returns all non-overlapping matches, left to right. With no capture groups each element is the full match text; with one group it is that group's text; with several groups it is a tuple of the group texts. Empty matches are included. Returns an empty tuple when nothing matches. Same error conditions as search.

load('re', 'findall')
print(findall(r'(\w)(\d)', 'a1 b2'))
print(findall(r'\w(\d)', 'a1 b2'))
print(findall('foo', 'bar baz'))
# Output:
# (("a", "1"), ("b", "2"))
# ("1", "2")
# ()
sub(pattern, repl, string, count=0, flags=0) -> string

Replaces non-overlapping matches with repl. count=0 (default) replaces all; a positive count replaces only the first count; a negative count replaces nothing (returns the input unchanged). repl uses Go template syntax: $1 or ${name} refers to a capture group and $$ is a literal $ — Python backslash references like \1 are not interpreted, and a function repl is not supported. Errors if any of pattern/repl/string is not a string, if pattern is invalid RE2, or if flags is non-zero.

load('re', 'sub')
print(sub('a', 'X', 'aaa', 1))
print(sub('a', 'X', 'aaa', count=2))
print(sub('a', 'X', 'aaa', -1))
print(sub('(a)(b)', '${2}${1}', 'ab ab'))
print(sub('a', '$$5', 'a'))
# Output:
# Xaa
# XXa
# aaa
# ba ba
# $5

Notes / boundaries

  • Engine: Go's regexp (RE2). RE2 guarantees linear-time matching but has no backreferences or lookaround; patterns needing those fail to compile with an error rather than being silently approximated.
  • flags must be 0. This module exports no flag constants. Any non-zero flags (e.g. a ported re.IGNORECASE == 2) is rejected with an error — historically it was silently ignored, which matched with default behavior and was wrong. Use inline pattern flags like (?i), (?m), (?s) instead.
  • Return shapes differ from CPython by design: there is no Match object; match returns a list of tuples, search returns a [start, end] index pair, and findall/split return tuples. The successor regex module returns Python-shaped lists — prefer it for new code.
  • split drops capture-group text, unlike Python's re.split.
  • sub templates are Go-style ($1, ${name}, $$), not Python-style (\1); repl must be a string.
  • Determinism: matching is deterministic; results depend only on the pattern and input string.
  • Not implemented from Python's re: fullmatch, finditer, subn, escape, and numeric flag constants.

Documentation

Overview

Package re defines regular expression functions, it's intended to be a drop-in subset of Python's re module for Starlark.

Migrated from: https://github.com/qri-io/starlib/tree/master/re TODO: 1) fullmatch, finditer, subn, escape, search 2) Match as a type 3) Support flags as constants

Index

Constants

View Source
const ModuleName = "re"

ModuleName defines the expected name for this Module when used in Starlark's load() function, eg: load('re', 'match')

Variables

This section is empty.

Functions

func LoadModule

func LoadModule() (starlark.StringDict, error)

LoadModule loads the re module. It is concurrency-safe and idempotent.

Types

type Regex

type Regex struct {
	// contains filtered or unexported fields
}

Regex is a starlark representation of a compiled regular expression

func (*Regex) Attr

func (r *Regex) Attr(name string) (starlark.Value, error)

Attr gets a value for a string attribute, implementing dot expression support in starklark. required by starlark.HasAttrs interface

func (*Regex) AttrNames

func (r *Regex) AttrNames() []string

AttrNames lists available dot expression strings for time. required by starlark.HasAttrs interface

func (*Regex) Freeze

func (r *Regex) Freeze()

Freeze renders time immutable. required by starlark.Value interface. The interface regex presents to the starlark runtime renders it immutable, making this a no-op

func (*Regex) Hash

func (r *Regex) Hash() (uint32, error)

Hash returns a function of x such that Equals(x, y) => Hash(x) == Hash(y) required by starlark.Value interface

func (*Regex) String

func (r *Regex) String() string

String implements the Stringer interface

func (*Regex) Truth

func (r *Regex) Truth() starlark.Bool

Truth returns the truth value of an object required by starlark.Value interface. Any non-empty regexp is considered truthy

func (*Regex) Type

func (r *Regex) Type() string

Type returns a short string describing the value's type.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL