analyzer

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 8, 2026 License: MIT Imports: 8 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Fingerprint

func Fingerprint(sql string) string

Fingerprint returns a stable, PII-free identity for sql: it is Redact followed by whitespace collapsing and IN/VALUES-list folding ("(?, ?, ?)" -> "(?)") so that queries differing only in literal values or list length share one fingerprint. A trailing ";" is trimmed.

The result is safe to use as a low-cardinality metric label or log key — it is the canonical query identity the runtime, the N+1 tracker, and any metrics/observability adapter group on.

func IsMultiStatement

func IsMultiStatement(sql string) bool

IsMultiStatement reports whether sql contains more than one SQL statement, i.e. a ";" statement separator followed by further non-whitespace content. Comments and string-literal bodies are removed first (reusing the same comment/literal-aware lexer the parser uses), so the check cannot be defeated by a ";" hidden in a -- / /* */ comment or inside a string literal — the evasion the brittle strings.Contains(query, ";") check allowed. A single trailing ";" is not multi-statement.

func ParseIgnoreComment

func ParseIgnoreComment(text string) (all bool, rules map[string]bool, found bool)

ParseIgnoreComment parses the text of a single comment for a sqlguard:ignore directive. It is used by the static scanner to honor `// sqlguard:ignore` / `// sqlguard:ignore:rule-a,rule-b` annotations in Go source. found reports whether a directive was present; all is true for a bare directive (suppress every rule); rules holds the named rules otherwise.

func Redact

func Redact(sql string) string

Redact returns sql with comments stripped and every single-quoted string literal and numeric literal replaced by a single "?" placeholder. Query structure, keywords, and identifiers (including double-quoted and backtick-quoted identifiers) are preserved, so the result stays readable and analyzable but carries no literal values — no emails, tokens, or other PII reach a log sink.

It is a zero-dependency lexical pass, not a full parser: it is intentionally conservative (e.g. it does not special-case hex/scientific forms beyond a simple exponent) and never errors. Use it whenever a query is about to leave the process.

func Register

func Register(spec RuleSpec)

Register adds a rule to the global registry. Built-in rules call this from init(); third-party rules may call it too. A duplicate name overwrites the previous spec, so a custom rule can replace a built-in one by name.

func RuleNames

func RuleNames() []string

RuleNames returns all registered rule names, sorted. Used by the config loader to validate rule references and by tooling to list rules.

Types

type Analyzer

type Analyzer struct {
	// contains filtered or unexported fields
}

Analyzer holds a set of rules and a Parser, and runs the rules against SQL queries. Configuration (disabled rules, severity overrides, per-rule settings) is resolved once at construction into the bound rule set and the severity map; the per-query Analyze path does no config work.

func Default

func Default() *Analyzer

Default creates an Analyzer with all registered built-in rules and the fallback parser, using each rule's default settings and severity.

func DefaultWithProfile

func DefaultWithProfile(p Profile) *Analyzer

DefaultWithProfile builds an Analyzer from the rule registry with the given Profile applied: disabled/whitelisted rules are filtered, per-rule settings are passed to each rule's factory, and severity overrides are precomputed. The config package uses this to turn a .sqlguard.yml into an Analyzer without analyzer ever importing config or YAML.

func New

func New(rules ...Rule) *Analyzer

New creates an Analyzer with the given anonymous rules, using the zero-dependency FallbackParser. Use WithParser to supply a real dialect parser. Rules added this way are not subject to profile overrides (they have no registry name); use Default/DefaultWithProfile for configurable built-in rules.

func (*Analyzer) Analyze

func (a *Analyzer) Analyze(query string) []Result

Analyze parses the query once and runs all rules against it. If the configured parser returns an error, it degrades to the FallbackParser so analysis never breaks the caller's query path. Findings for rules named in an in-SQL `sqlguard:ignore` directive are suppressed, and severity overrides from the active Profile are applied.

func (*Analyzer) PrepareQuery

func (a *Analyzer) PrepareQuery(raw string) (display, fingerprint string)

PrepareQuery returns the query field and fingerprint for a Result built outside the rule path (e.g. the runtime slow-query and N+1 findings), applying the same redaction policy as Analyze so every emitted Result is consistent. display is redacted unless the Analyzer was built WithRawQuery; fingerprint is always the PII-free identity.

func (*Analyzer) WithParser

func (a *Analyzer) WithParser(p Parser) *Analyzer

WithParser returns a copy of the Analyzer that uses the given Parser. Passing nil resets it to the FallbackParser.

func (*Analyzer) WithRawQuery

func (a *Analyzer) WithRawQuery() *Analyzer

WithRawQuery returns a copy of the Analyzer that leaves Result.Query unredacted (the raw SQL, literals and all). Redaction is on by default so literal values never reach a log sink; opt out only for local debugging where the query text is trusted. Fingerprint is always populated either way.

type FallbackParser

type FallbackParser struct{}

FallbackParser is the zero-dependency Parser. It removes SQL comments and string-literal contents before pattern matching, so keywords inside comments or strings (and identifiers like update_at) no longer cause false positives. It is best-effort and never returns an error: SQL it cannot fully understand still yields a usable Statement with Exact=false.

func NewFallbackParser

func NewFallbackParser() *FallbackParser

NewFallbackParser returns the default zero-dependency parser.

func (*FallbackParser) Parse

func (p *FallbackParser) Parse(sql string) (*Statement, error)

Parse implements Parser. It always returns a non-nil Statement and a nil error.

type Parser

type Parser interface {
	Parse(sql string) (*Statement, error)
}

Parser turns a raw SQL string into sqlguard's normalized Statement.

Implementations:

  • FallbackParser (this package): zero-dependency, best-effort, never returns an error.
  • parsers/pgparser, parsers/mysqlparser (optional modules): real dialect ASTs, exact analysis, fall back to FallbackParser on parse failure.

A Parser used on the runtime query path MUST NOT panic and SHOULD avoid returning an error for SQL it merely doesn't understand — degrade to a best-effort Statement instead, so analysis never breaks db.Query.

type Profile

type Profile struct {
	// Disabled rules are not constructed or run.
	Disabled map[string]bool
	// Only, when non-empty, is a whitelist: only these rules run.
	Only map[string]bool
	// Severity overrides a rule's reported severity.
	Severity map[string]Severity
	// Settings holds per-rule tunables.
	Settings map[string]Settings
	// RawQuery, when true, disables Result.Query redaction (literals are
	// left in the reported SQL). Default (false) redacts — see
	// Analyzer.WithRawQuery.
	RawQuery bool
}

Profile is the resolved, parser-independent view of configuration applied to an Analyzer at construction time. The config package builds it from .sqlguard.yml; analyzer never imports config or YAML. All maps are keyed by rule name. Resolution happens once here, never on the per-query path.

type Result

type Result struct {
	RuleName string
	Severity Severity
	// Query is the offending SQL as surfaced to reporters. By default it is
	// redacted (string/numeric literals replaced with "?") so literal values
	// never reach a log sink; an Analyzer built WithRawQuery leaves it raw.
	Query string
	// Fingerprint is the redacted, whitespace-collapsed, list-folded query
	// identity (see analyzer.Fingerprint). It is always set, never carries
	// PII, and is safe as a metric label or log key.
	Fingerprint string
	Message     string
	Suggestion  string
	File        string // populated only in static analysis mode
	Line        int    // populated only in static analysis mode
}

Result represents a single finding from query analysis.

func CheckAddNotNullWithoutDefault

func CheckAddNotNullWithoutDefault(s *Statement) (Result, bool)

CheckAddNotNullWithoutDefault detects an ALTER TABLE that adds a NOT NULL column with no DEFAULT, which errors or forces a full table rewrite on a populated table.

func CheckCartesianJoin

func CheckCartesianJoin(s *Statement) (Result, bool)

CheckCartesianJoin detects a multi-table FROM with no join condition and no WHERE filter — an unconditioned cartesian product (incl. CROSS JOIN).

func CheckDeleteWithoutWhere

func CheckDeleteWithoutWhere(s *Statement) (Result, bool)

CheckDeleteWithoutWhere detects DELETE statements without a WHERE clause.

func CheckImplicitJoin

func CheckImplicitJoin(s *Statement) (Result, bool)

CheckImplicitJoin detects a FROM clause that joins tables with commas (FROM a, b) instead of explicit JOIN syntax — error-prone because a forgotten join condition silently yields a cartesian product.

func CheckInListTooLarge

func CheckInListTooLarge(s *Statement) (Result, bool)

CheckInListTooLarge detects an IN (...) value list with more elements than the default threshold (100). Use the registry / config to tune max-length.

func CheckInsertWithoutColumns

func CheckInsertWithoutColumns(s *Statement) (Result, bool)

CheckInsertWithoutColumns detects INSERT statements without an explicit column list.

func CheckLargeOffset

func CheckLargeOffset(s *Statement) (Result, bool)

CheckLargeOffset detects a literal OFFSET larger than the default threshold (1000). Use the registry / config to tune threshold.

func CheckLeadingWildcard

func CheckLeadingWildcard(s *Statement) (Result, bool)

CheckLeadingWildcard detects LIKE patterns with leading wildcards, using the rule's default settings (no minimum term length).

func CheckNonSargablePredicate

func CheckNonSargablePredicate(s *Statement) (Result, bool)

CheckNonSargablePredicate detects a function or cast applied to a column on the column side of a WHERE comparison (e.g. WHERE LOWER(email) = ...), which prevents an ordinary index on that column from being used.

func CheckOrderByWithoutLimit

func CheckOrderByWithoutLimit(s *Statement) (Result, bool)

CheckOrderByWithoutLimit detects ORDER BY without LIMIT, which sorts the entire result set.

func CheckSelectDistinct

func CheckSelectDistinct(s *Statement) (Result, bool)

CheckSelectDistinct detects a select-level DISTINCT, which is often added to hide duplicate rows produced by an unintended join fan-out rather than to express a genuine need for distinct results. INFO by default.

func CheckSelectStar

func CheckSelectStar(s *Statement) (Result, bool)

CheckSelectStar detects SELECT * usage.

func CheckSelectWithoutLimit

func CheckSelectWithoutLimit(s *Statement) (Result, bool)

CheckSelectWithoutLimit detects SELECT statements without a LIMIT clause. Only flags queries that have a FROM clause (to skip SELECT 1, SELECT version(), etc.) and don't have WHERE, to reduce noise.

func CheckUpdateWithoutWhere

func CheckUpdateWithoutWhere(s *Statement) (Result, bool)

CheckUpdateWithoutWhere detects UPDATE statements without a WHERE clause.

type Rule

type Rule func(s *Statement) (Result, bool)

Rule checks a normalized Statement and returns a Result if an issue is found. It returns the result and true if an issue was detected, or a zero Result and false otherwise.

Rules operate on the parsed Statement, not the raw SQL string, so a query is parsed once per Analyze call and every rule sees the same dialect- agnostic view.

type RuleSpec

type RuleSpec struct {
	Name            string
	DefaultSeverity Severity
	Factory         func(Settings) Rule
}

RuleSpec describes a built-in rule: its stable name (used in config, suppressions and reports), its default severity, and a factory that builds the rule from its settings. Keeping construction behind a factory is what makes per-rule settings work uniformly for every present and future rule.

type Settings

type Settings map[string]any

Settings holds rule-specific configuration as a generic key/value map so new tunables can be added without changing this type or the config schema. Accessors are nil-safe and fall back to the provided default, so a rule can always be constructed even with no settings supplied.

func (Settings) Bool

func (s Settings) Bool(key string, def bool) bool

Bool returns the setting as a bool, or def if missing or not a bool.

func (Settings) Duration

func (s Settings) Duration(key string, def time.Duration) time.Duration

Duration returns the setting parsed as a time.Duration. It accepts a duration string ("200ms") or a number interpreted as milliseconds. Returns def if missing or unparseable.

func (Settings) Int

func (s Settings) Int(key string, def int) int

Int returns the setting as an int, or def if missing or not numeric. YAML decodes integers as int and JSON as float64, so both are accepted.

func (Settings) String

func (s Settings) String(key, def string) string

String returns the setting as a string, or def if missing or not a string.

type Severity

type Severity int

Severity represents the importance level of an analysis finding.

const (
	// SeverityInfo is an advisory finding worth noting but not necessarily acting on.
	SeverityInfo Severity = iota
	// SeverityWarning is a likely problem that should be reviewed.
	SeverityWarning
	// SeverityCritical is a serious problem likely to cause incorrect or destructive behavior.
	SeverityCritical
)

func (Severity) String

func (s Severity) String() string

type Statement

type Statement struct {
	// Raw is the original, untouched SQL string. Reported back to users.
	Raw string

	// Kind is the statement's top-level kind.
	Kind StmtKind

	// HasWhere reports whether the statement has a WHERE clause.
	HasWhere bool

	// HasLimit reports whether the statement has a LIMIT clause.
	HasLimit bool

	// HasOrderBy reports whether the statement has an ORDER BY clause.
	HasOrderBy bool

	// HasFrom reports whether a SELECT has a FROM clause. Distinguishes
	// "SELECT * FROM t" from "SELECT 1" / "SELECT version()".
	HasFrom bool

	// SelectStar reports an unqualified "SELECT *" / "SELECT t.*" of columns.
	// It is false for aggregate forms like COUNT(*).
	SelectStar bool

	// SelectDistinct reports a select-level DISTINCT (SELECT DISTINCT ...,
	// incl. Postgres DISTINCT ON and MySQL DISTINCTROW). It is false for an
	// aggregate-level DISTINCT such as COUNT(DISTINCT col), which is unrelated.
	// The dialect parsers compute it from the AST; the fallback approximates it
	// lexically.
	SelectDistinct bool

	// InsertColumnsListed reports whether an INSERT names its target columns
	// explicitly: INSERT INTO t (a, b) VALUES (...). Only meaningful when
	// Kind == StmtInsert.
	InsertColumnsListed bool

	// LeadingWildcardLike reports a LIKE pattern beginning with a wildcard
	// (e.g. LIKE '%foo'), which prevents index use.
	LeadingWildcardLike bool

	// NonSargablePredicate reports a function or cast applied to a column on
	// the column side of a WHERE comparison (e.g. WHERE LOWER(email) = ...),
	// which prevents the use of an ordinary index on that column. Like the
	// LIKE fields, this is a literal/text-level heuristic the real parsers'
	// ASTs discard, so it is computed by the fallback lexer and preserved by
	// the dialect parsers rather than recomputed structurally.
	NonSargablePredicate bool

	// AddNotNullNoDefault reports an ALTER TABLE that adds a NOT NULL column
	// with no DEFAULT (e.g. ALTER TABLE t ADD COLUMN c int NOT NULL), which
	// fails or forces a table rewrite on a populated table. Like the other
	// text-level fields above, it is computed by the fallback lexer and
	// preserved by the dialect parsers.
	AddNotNullNoDefault bool

	// ImplicitCommaJoin reports a FROM clause that lists multiple tables
	// separated by top-level commas (FROM a, b) instead of explicit JOIN
	// syntax — the old-style join that silently produces a cartesian product
	// when its join condition is forgotten. Computed by the fallback lexer and
	// preserved (not recomputed from the AST) by the dialect parsers, so it
	// stays a best-effort heuristic even when Exact is true.
	ImplicitCommaJoin bool

	// CartesianJoin reports a multi-table FROM (comma join, CROSS JOIN, or a
	// bare JOIN) with no join condition (ON/USING/NATURAL) and no top-level
	// WHERE filter — an unconditioned cartesian product. It is the high-
	// confidence subset of ImplicitCommaJoin and also covers CROSS/bare JOIN.
	// Like ImplicitCommaJoin, it is a fallback-lexer heuristic preserved by the
	// dialect parsers, so it stays best-effort even when Exact is true.
	CartesianJoin bool

	// MaxInListLen is the largest element count among the statement's IN (...)
	// value lists (IN (SELECT ...) subqueries are excluded). It powers the
	// in-list-too-large rule's max-length threshold. Zero means no value-list
	// IN was found. Like the other counts, rules read it, never raw SQL. It is a
	// fallback-lexer heuristic preserved by the dialect parsers (the AST discards
	// the literal list it counts), so it stays best-effort even when Exact is true.
	MaxInListLen int

	// OffsetValue is the largest literal OFFSET seen (standard OFFSET n or
	// MySQL's LIMIT offset, count), powering the large-offset rule. Zero means
	// no offset, OFFSET 0, or a parameterized offset (OFFSET $1 / ?), which
	// cannot be evaluated statically and is therefore never flagged. The dialect
	// parsers read it from the AST's limit clause; the fallback scans for it.
	OffsetValue int

	// LeadingWildcardTermLen is the length of the longest searchable term
	// (the literal with surrounding % wildcards trimmed) across all
	// leading-wildcard LIKE patterns in the statement. It powers the
	// leading-wildcard rule's min-length setting. Zero means "unknown"
	// (e.g. produced by a real parser that did not compute it); rules must
	// treat zero as unknown and not as "short", to avoid false negatives.
	LeadingWildcardTermLen int

	// Exact is true when the Statement was produced by a real SQL parser
	// (structural analysis), false when produced by the regex fallback
	// (best-effort). Rules may use this to suppress lower-confidence findings.
	//
	// "Exact" covers the structural facts the dialect parsers derive from the
	// AST: Kind, HasWhere/HasLimit/HasOrderBy/HasFrom, SelectStar,
	// SelectDistinct, OffsetValue, and InsertColumnsListed. A few facts stay
	// lexical heuristics even when Exact is true — MaxInListLen,
	// ImplicitCommaJoin, CartesianJoin, and the literal/text-level fields
	// (LeadingWildcard*, NonSargablePredicate, AddNotNullNoDefault) — because
	// they read literal values the AST discards or are intentionally text-level.
	// Each such field documents this.
	Exact bool
}

Statement is sqlguard's normalized, dialect-agnostic view of a single SQL statement. It carries only the semantic facts the rules need — not a full AST. Every Parser (the zero-dependency fallback and the optional real dialect parsers) populates this same struct, so rules never depend on a particular parser or dialect.

Boolean fields are best-effort: a fallback-produced Statement may leave a field false when it genuinely cannot tell. Rules must treat "false" as "not detected", never as "proven absent", to avoid false positives.

type StmtKind

type StmtKind int

StmtKind is the top-level kind of a SQL statement.

const (
	// StmtUnknown means the parser could not determine the statement kind.
	StmtUnknown StmtKind = iota
	// StmtSelect is a SELECT (or WITH ... SELECT) query.
	StmtSelect
	// StmtInsert is an INSERT statement.
	StmtInsert
	// StmtUpdate is an UPDATE statement.
	StmtUpdate
	// StmtDelete is a DELETE statement.
	StmtDelete
	// StmtOther is a recognized statement that none of the rules target
	// (DDL, transaction control, etc.).
	StmtOther
)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL