parsan

package
v0.0.0-...-b5b6610 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 3, 2026 License: Apache-2.0 Imports: 7 Imported by: 0

Documentation

Overview

Package parsan is a string parser and sanitizer function.

The goal of this package to parse and sanitize strings according to different rules.

Currently, the package can parse and sanitize strings according to the definition of the subdomain definition of RFC1035.

Index

Examples

Constants

View Source
const Unlimited = -1

Unlimited is a sentinel value indicating no upper bound on repetitions or length constraints. Used with Seq for unbounded repetition and with maxLength to indicate no length limit.

Variables

View Source
var RFC1035LowerSubdomain = Named("rfc1035-lower-subdomain",
	Alternative(
		RFC1035LowerLabel(subdomainSuggestFn),
		Concat(
			RFC1035LowerLabel(subdomainSuggestFn),
			Terminal("."),
			Ref("rfc1035-lower-subdomain"),
		),
	)).WithMaxLength(253)

RFC1035LowerSubdomain is a Rule identical to RFC1035Subdomain but enforces lowercase letters throughout all labels. This produces case-normalized subdomain strings suitable for systems that perform case-insensitive DNS comparisons via exact string matching.

View Source
var RFC1035LowerSubdomainRelaxed = Named("rfc1035-lower-subdomain-relaxed",
	Alternative(
		RFC1035LowerLabelRelaxed(subdomainSuggestFn),
		Concat(
			RFC1035LowerLabelRelaxed(subdomainSuggestFn),
			Terminal("."),
			Ref("rfc1035-lower-subdomain-relaxed"),
		),
	)).WithMaxLength(253)

RFC1035LowerSubdomainRelaxed is a Rule that combines relaxed starting character requirements with lowercase letter enforcement. Labels may start with a lowercase letter or digit, and all letters are restricted to lowercase. This is the most permissive variant while still enforcing case normalization.

View Source
var RFC1035Subdomain = Named("rfc1035-subdomain",
	Alternative(
		RFC1035Label(subdomainSuggestFn),
		Concat(
			RFC1035Label(subdomainSuggestFn),
			Terminal("."),
			Ref("rfc1035-subdomain"),
		),
	)).WithMaxLength(253)

RFC1035Subdomain is a Rule that validates and sanitizes DNS subdomains according to RFC 1035 Section 2.3.1.

The grammar for a subdomain is:

<subdomain> ::= <label> | <subdomain> "." <label>

A subdomain consists of one or more dot-separated labels, where each label conforms to RFC1035Label. The total length must not exceed 253 characters.

Invalid characters within labels are sanitized as follows:

  • '@' is replaced with "-at-" or "-" (both options explored)
  • Other invalid characters are replaced with '-'
View Source
var RFC1035SubdomainRelaxed = Named("rfc1035-subdomain-relaxed",
	Alternative(
		RFC1035LabelRelaxed(subdomainSuggestFn),
		Concat(
			RFC1035LabelRelaxed(subdomainSuggestFn),
			Terminal("."),
			Ref("rfc1035-subdomain-relaxed"),
		),
	)).WithMaxLength(253)

RFC1035SubdomainRelaxed is a Rule similar to RFC1035Subdomain but uses RFC1035LabelRelaxed for each label, allowing labels to start with digits. This accommodates the common practice of using digit-prefixed labels in DNS names.

Functions

func GenerateUniqueName

func GenerateUniqueName() string

GenerateUniqueName creates a unique name that does not exist in the global rule registry. This is used internally by Seq to create named rules for unbounded repetition patterns.

The function generates names by combining the reserved prefix with a random offset plus an incrementing counter. It panics after 50,000 failed attempts, which would indicate severe registry congestion (an unlikely scenario in normal usage).

func ParseAndSanitize

func ParseAndSanitize(input string, rule Rule) []string

ParseAndSanitize validates the input string against the given rule and returns all valid interpretations as sanitized strings.

The function performs the following steps:

  1. Truncates the input if the rule specifies a maximum length constraint
  2. Executes the rule's match function to collect all possible parse results
  3. Filters results to include only those that fully consume the input (empty rest)
  4. Deduplicates identical results
  5. Sorts results by length in descending order, with alphabetical ordering as tiebreaker

Returns an empty slice if no valid complete parse exists for the input.

Types

type Rule

type Rule interface {

	// WithSuggestionFunc attaches a fallback suggestion generator that is
	// invoked when the rule fails to match the input. Not all rule types
	// support suggestions; calling this on an unsupported type will panic.
	WithSuggestionFunc(SuggestionFunc) Rule

	// WithMaxLength sets the maximum allowed length for matched content.
	// Results exceeding this length will be truncated or filtered.
	// Returns the modified rule for method chaining.
	WithMaxLength(maxLength int) Rule
	// contains filtered or unexported methods
}

Rule defines the interface for all grammar rule types in the parser. Rules are composable building blocks that can be combined to construct complex grammars for input validation and sanitization.

The parser uses a recursive descent approach with backtracking, where each rule can produce multiple possible parse results through a channel. This design supports ambiguous grammars where multiple valid interpretations may exist for a single input.

Built-in rule implementations include:

  • Terminal: matches exact string literals
  • Range: matches single characters within Unicode code point bounds
  • Concat: matches a sequence of rules in order
  • Alternative: matches any one of several possible rules
  • Named/Ref: enables recursive grammars and forward references
  • Seq: matches repeated occurrences of a rule (with min/max bounds)
  • Opt: matches zero or one occurrence of a rule

func Alternative

func Alternative(types ...Rule) Rule

Alternative creates a rule that matches if any of the provided rules match. All matching alternatives are explored and their results are yielded, enabling the parser to handle ambiguous grammars with multiple valid interpretations of the same input.

Use WithSuggestionFunc to provide fallback suggestions when none of the alternatives match.

Example
package main

import (
	"fmt"

	"github.com/SAP/xp-clifford/parsan"
)

func main() {
	rule := parsan.Alternative(
		parsan.Terminal("either"),
		parsan.Terminal("or"),
	)
	suggestions := parsan.ParseAndSanitize("either", rule)
	fmt.Print(suggestions[0])
}
Output:

either
Example (No_match)
package main

import (
	"fmt"

	"github.com/SAP/xp-clifford/parsan"
)

func main() {
	rule := parsan.Alternative(
		parsan.Terminal("either"),
		parsan.Terminal("or"),
	)
	suggestions := parsan.ParseAndSanitize("none", rule)
	fmt.Print(len(suggestions))
}
Output:

0

func Concat

func Concat(rules ...Rule) Rule

Concat creates a rule that matches multiple rules in sequential order. The input is partitioned into consecutive segments, where each segment matches the corresponding rule in order.

The function handles special cases:

  • Zero arguments: returns a rule that matches only empty strings (nil concat)
  • One argument: returns that rule unchanged (no wrapping needed)
  • Multiple arguments: combines rules right-to-left into nested concat pairs

WithSuggestionFunc is not supported for Concat; apply suggestions to the individual component rules instead.

Example
package main

import (
	"fmt"

	"github.com/SAP/xp-clifford/parsan"
)

func main() {
	rule := parsan.Concat(parsan.Range('0', '9'), parsan.Range('a', 'z'))
	suggestions := parsan.ParseAndSanitize("2b", rule)
	fmt.Print(suggestions[0])
}
Output:

2b
Example (No_match)
package main

import (
	"fmt"

	"github.com/SAP/xp-clifford/parsan"
)

func main() {
	rule := parsan.Concat(parsan.Range('0', '9'), parsan.Range('a', 'z'))
	suggestions := parsan.ParseAndSanitize("b2", rule)
	fmt.Print(len(suggestions))
}
Output:

0

func Digit

func Digit(suggestFn SuggestionFunc) Rule

Digit returns a rule that matches a single ASCII digit character (0-9). If suggestFn is provided, it will be called to generate suggestions when the rule fails to match the input.

func LDHStr

func LDHStr(suggestFn SuggestionFunc) Rule

LDHStr returns a rule that matches one or more consecutive LDH (Letter-Digit-Hyphen) characters. This corresponds to the "ldh-str" production in RFC 1035 for DNS domain name labels.

The rule is implemented recursively using a unique named reference to handle strings of arbitrary length.

If suggestFn is provided, it will be called to generate suggestions when the rule fails to match the input.

Example
package main

import (
	"fmt"

	"github.com/SAP/xp-clifford/parsan"
)

func main() {
	suggestions := parsan.ParseAndSanitize("this-is-1-valid-LDHStr", parsan.LDHStr(nil))
	fmt.Print(suggestions[0])
}
Output:

this-is-1-valid-LDHStr
Example (No_match)
package main

import (
	"fmt"

	"github.com/SAP/xp-clifford/parsan"
)

func main() {
	suggestions := parsan.ParseAndSanitize("inva!id", parsan.LDHStr(nil))
	fmt.Print(len(suggestions))
}
Output:

0
Example (Sanitize)
package main

import (
	"fmt"

	"github.com/SAP/xp-clifford/parsan"
)

func main() {
	suggestions := parsan.ParseAndSanitize("inva!id", parsan.LDHStr(parsan.SuggestConstRune('-')))
	fmt.Print(suggestions[0])
}
Output:

inva-id

func LetDig

func LetDig(suggestFn SuggestionFunc) Rule

LetDig returns a rule that matches a single alphanumeric ASCII character. Valid characters include digits (0-9), lowercase letters (a-z), and uppercase letters (A-Z). If suggestFn is provided, it will be called to generate suggestions when the rule fails to match the input.

func LetDigHyp

func LetDigHyp(suggestFn SuggestionFunc) Rule

LetDigHyp returns a rule that matches a single LDH (Letter-Digit-Hyphen) character. Valid characters include alphanumeric characters (0-9, a-z, A-Z) and the hyphen character ('-').

This rule implements the character set used in DNS domain name labels as defined in RFC 1035.

If suggestFn is provided, it will be called to generate suggestions when the rule fails to match the input.

func Letter

func Letter(suggestFn SuggestionFunc) Rule

Letter returns a rule that matches a single ASCII letter character. Both lowercase (a-z) and uppercase (A-Z) letters are accepted. If suggestFn is provided, it will be called to generate suggestions when the rule fails to match the input.

func LowerLDHStr

func LowerLDHStr(suggestFn SuggestionFunc) Rule

LowerLDHStr returns a rule that matches one or more consecutive lowercase LDH (Letter-Digit-Hyphen) characters. Valid characters include digits (0-9), lowercase letters (a-z), and hyphens ('-').

The rule is implemented recursively using a unique named reference to handle strings of arbitrary length.

If suggestFn is provided, it will be called to generate suggestions when the rule fails to match the input.

func LowerLetDig

func LowerLetDig(suggestFn SuggestionFunc) Rule

LowerLetDig returns a rule that matches a single lowercase alphanumeric ASCII character. Valid characters include digits (0-9) and lowercase letters (a-z). Uppercase letters are not matched directly but may be suggested as their lowercase equivalents via the LowerLetter rule. If suggestFn is provided, it will be called to generate suggestions when the rule fails to match the input.

func LowerLetDigHyp

func LowerLetDigHyp(suggestFn SuggestionFunc) Rule

LowerLetDigHyp returns a rule that matches a single lowercase LDH (Letter-Digit-Hyphen) character. Valid characters include digits (0-9), lowercase letters (a-z), and the hyphen character ('-').

This is useful for parsing domain name labels in a case-normalized form.

If suggestFn is provided, it will be called to generate suggestions when the rule fails to match the input.

func LowerLetter

func LowerLetter(suggestFn SuggestionFunc) Rule

LowerLetter returns a rule that matches a single lowercase ASCII letter (a-z).

When matching fails, the rule uses a two-stage suggestion strategy:

  1. First, suggestLowerLetter attempts to convert an uppercase letter to lowercase.
  2. If that fails (e.g., the character is not a letter), suggestFn is invoked.

If suggestFn is nil, only the uppercase-to-lowercase conversion is suggested.

func Named

func Named(name string, rule Rule) Rule

Named registers a rule in the global registry under the specified name and returns the rule unchanged. This enables other rules to reference this rule by name using Ref, supporting:

  • Forward references: use a rule before defining it in the source code
  • Recursive grammars: a rule that references itself directly or indirectly
  • Reusability: define a rule once and reference it from multiple places
Example
package main

import (
	"fmt"

	"github.com/SAP/xp-clifford/parsan"
)

func main() {
	// rule := a | a <rule>
	rule := parsan.Named("rule",
		parsan.Alternative(
			parsan.Terminal("a"),
			parsan.Concat(
				parsan.Terminal("a"),
				parsan.Ref("rule"),
			),
		),
	)
	suggestions := parsan.ParseAndSanitize("aaaaaaaaa", rule)
	fmt.Print(suggestions[0])
}
Output:

aaaaaaaaa

func Opt

func Opt(rule Rule) Rule

Opt creates a rule that matches zero or one occurrence of the given rule. This is a convenience function equivalent to Seq(0, 1, rule).

The rule always succeeds: it yields an empty match (consuming nothing) and, if the input matches the rule, also yields the full match. This makes the wrapped rule optional in the grammar.

func RFC1035Label

func RFC1035Label(suggestFn SuggestionFunc) Rule

RFC1035Label returns a Rule that validates and sanitizes DNS labels according to RFC 1035 Section 2.3.1.

The grammar for a label is:

<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]

Valid labels must:

  • Start with a letter (a-z, A-Z)
  • End with a letter or digit (if longer than one character)
  • Contain only letters, digits, or hyphens in between
  • Be at most 63 characters long

Sanitization strategies:

  • Invalid first character: prepended or replaced with 'x'
  • Invalid last character: replaced with 'x'
  • Invalid middle characters: handled by the provided suggestFn
Example
package main

import (
	"fmt"

	"github.com/SAP/xp-clifford/parsan"
)

func main() {
	suggestions := parsan.ParseAndSanitize("this-is-a-valid-RFC1035-label", parsan.RFC1035Label(nil))
	fmt.Print(suggestions[0])
}
Output:

this-is-a-valid-RFC1035-label
Example (No_match)
package main

import (
	"fmt"

	"github.com/SAP/xp-clifford/parsan"
)

func main() {
	suggestions := parsan.ParseAndSanitize("0this-is-an-!nvalid-RFC1035-label#", parsan.RFC1035Label(nil))
	fmt.Print(len(suggestions))
}
Output:

0
Example (Sanitize)
package main

import (
	"fmt"

	"github.com/SAP/xp-clifford/parsan"
)

func main() {
	suggestions := parsan.ParseAndSanitize("0this-is-an-!nvalid-RFC1035-label#",
		parsan.RFC1035Label(parsan.SuggestConstRune('A')))
	fmt.Print(suggestions[0])
}
Output:

x0this-is-an-Anvalid-RFC1035-labelx

func RFC1035LabelRelaxed

func RFC1035LabelRelaxed(suggestFn SuggestionFunc) Rule

RFC1035LabelRelaxed returns a Rule similar to RFC1035Label but allows labels to start with a digit in addition to letters. This relaxation is common in practice, as many systems accept labels beginning with digits despite the strict RFC 1035 grammar.

func RFC1035LowerLabel

func RFC1035LowerLabel(suggestFn SuggestionFunc) Rule

RFC1035LowerLabel returns a Rule identical to RFC1035Label but restricts letters to lowercase only (a-z). This is useful when case-normalized labels are required, enabling case-insensitive comparisons via exact string matching.

func RFC1035LowerLabelRelaxed

func RFC1035LowerLabelRelaxed(suggestFn SuggestionFunc) Rule

RFC1035LowerLabelRelaxed returns a Rule that combines the relaxed starting character requirement (allowing digits) with lowercase letter enforcement. Labels may start with a lowercase letter or digit and contain only lowercase letters, digits, and hyphens.

func Range

func Range(start, end rune) Rule

Range creates a rule that matches a single character whose Unicode code point is between start and end, inclusive. If end is less than start, the range is normalized to match only the start character.

Use WithSuggestionFunc to provide alternative suggestions when the input character falls outside the valid range.

Example
package main

import (
	"fmt"

	"github.com/SAP/xp-clifford/parsan"
)

func main() {
	rule := parsan.Range('a', 'z')
	suggestions := parsan.ParseAndSanitize("e", rule)
	fmt.Print(suggestions[0])
}
Output:

e
Example (No_match)
package main

import (
	"fmt"

	"github.com/SAP/xp-clifford/parsan"
)

func main() {
	rule := parsan.Range('a', 'z')
	suggestions := parsan.ParseAndSanitize("A", rule)
	fmt.Print(len(suggestions))
}
Output:

0
Example (With_suggestions)
package main

import (
	"fmt"

	"github.com/SAP/xp-clifford/parsan"
)

func main() {
	rule := parsan.Range('a', 'z').WithSuggestionFunc(parsan.ReplaceFirstRuneWithStrings("x"))
	suggestions := parsan.ParseAndSanitize("A", rule)
	fmt.Print(suggestions[0])
}
Output:

x

func Ref

func Ref(name string) Rule

Ref creates a lazy reference to a rule registered via Named. The actual rule lookup is deferred until match time, which allows:

  • Forward references: reference a rule before it is registered
  • Recursive definitions: a rule can reference itself through a Ref

If the referenced rule is not registered when match is called, the Ref yields no results (returns emptyResultChan).

WithSuggestionFunc is not supported for Ref; apply suggestions to the target rule instead.

func Seq

func Seq(min, max int, rule Rule) Rule

Seq creates a rule that matches between min and max consecutive occurrences of the given rule. This is the general-purpose repetition combinator.

Parameters:

  • min: minimum number of repetitions required (negative values are treated as 0)
  • max: maximum number of repetitions allowed (use Unlimited for no upper bound; values less than min are normalized to min)
  • rule: the rule to be repeated

Implementation details:

  • For fixed counts (min == max): creates a simple concatenation of that many copies
  • For unbounded max: creates a recursive grammar using Named/Ref
  • For bounded ranges: creates an Alternative of all valid repetition counts

Common patterns:

  • Seq(0, 0, r): matches only empty string
  • Seq(1, 1, r): equivalent to just r
  • Seq(0, 1, r): equivalent to Opt(r), matches zero or one
  • Seq(2, 5, r): matches 2, 3, 4, or 5 consecutive occurrences
  • Seq(1, Unlimited, r): Kleene plus, matches one or more
  • Seq(0, Unlimited, r): Kleene star, matches zero or more
Example
package main

import (
	"fmt"

	"github.com/SAP/xp-clifford/parsan"
)

func main() {
	rule := parsan.Seq(2, 4, parsan.Terminal("a"))
	suggestions := parsan.ParseAndSanitize("aaa", rule)
	fmt.Print(suggestions[0])
}
Output:

aaa
Example (No_match)
package main

import (
	"fmt"

	"github.com/SAP/xp-clifford/parsan"
)

func main() {
	rule := parsan.Seq(2, 4, parsan.Terminal("a"))
	suggestions := parsan.ParseAndSanitize("a", rule)
	fmt.Print(len(suggestions))
}
Output:

0

func Terminal

func Terminal(s string) Rule

Terminal creates a rule that matches the exact string s at the start of the input. On successful match, it consumes exactly len(s) characters and leaves the remainder for subsequent rules to process.

Example
package main

import (
	"fmt"

	"github.com/SAP/xp-clifford/parsan"
)

func main() {
	rule := parsan.Terminal("match")
	suggestions := parsan.ParseAndSanitize("match", rule)
	fmt.Print(suggestions[0])
}
Output:

match
Example (No_match)
package main

import (
	"fmt"

	"github.com/SAP/xp-clifford/parsan"
)

func main() {
	rule := parsan.Terminal("match")
	suggestions := parsan.ParseAndSanitize("invalid", rule)
	fmt.Print(len(suggestions))
}
Output:

0

type SuggestionFunc

type SuggestionFunc func(string) []*parseResult

SuggestionFunc defines a function type that generates parsing suggestions. It takes an input string and returns a slice of result pointers representing alternative parsing interpretations. This type is typically passed to WithSuggestionFunc methods on Rule types to customize suggestion behavior.

func MergeSuggestionFuncs

func MergeSuggestionFuncs(fns ...SuggestionFunc) SuggestionFunc

MergeSuggestionFuncs combines multiple SuggestionFunc functions into a single SuggestionFunc. The returned function invokes each provided function in order and concatenates all their results into a single slice. This is useful for aggregating suggestions from multiple independent sources.

func PrependOrReplaceFirstRuneWithStrings

func PrependOrReplaceFirstRuneWithStrings(ss ...string) SuggestionFunc

PrependOrReplaceFirstRuneWithStrings creates a SuggestionFunc that generates two types of suggestions for each provided string:

  1. Prepending the string to the entire input (insertion before input)
  2. Replacing the first rune of the input with the string (substitution)

For an input "abc" and string "X", this produces suggestions for parsing "X" with remainder "abc", and "X" with remainder "bc". Returns up to 2*len(ss) results, with the replacement variant omitted for empty input.

func ReplaceFirstRuneWithStrings

func ReplaceFirstRuneWithStrings(ss ...string) SuggestionFunc

ReplaceFirstRuneWithStrings creates a SuggestionFunc that generates suggestions by substituting the first byte of the input with each of the provided strings. Each result contains the replacement string as the sanitized portion and the remaining input (after the first byte) as the portion still to be parsed. Returns nil if the input is empty, as there is no character to replace.

func SuggestConstRune

func SuggestConstRune(r rune) SuggestionFunc

SuggestConstRune creates a SuggestionFunc that suggests replacing the first rune of the input with the specified rune r. This is a convenience wrapper around ReplaceFirstRuneWithStrings for single-rune replacements.

func UnlessSuggestionFunc

func UnlessSuggestionFunc(unlessFn, thenFn SuggestionFunc) SuggestionFunc

UnlessSuggestionFunc creates a conditional SuggestionFunc that first attempts to generate suggestions using unlessFn. If unlessFn returns no results, it falls back to thenFn. This allows for prioritized suggestion strategies where one approach is preferred but another serves as a fallback.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL