linkgraph

package
v0.24.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 24, 2026 License: MIT Imports: 12 Imported by: 0

Documentation

Overview

Package linkgraph extracts Markdown links and heading anchors so the link-validity rule (MDS027) and the `backlinks` subcommand share one implementation of the link walk, anchor slug rules, and target parsing.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CollectAnchors

func CollectAnchors(f *lint.File) map[string]bool

CollectAnchors returns the set of heading anchors defined in f, with GitHub-compatible disambiguation suffixes (-1, -2, …) when slugs would otherwise collide. Uniqueness is enforced against the running set of produced anchors so a sequence like "Intro" / "Intro" / "Intro-1" yields three distinct keys (`intro`, `intro-1`, `intro-1-1`) rather than two distinct ones with a collision. The set keys are the slugified anchor names; values are always true so callers can use map-lookup.

func DecodeAnchor added in v0.15.0

func DecodeAnchor(raw string) string

DecodeAnchor URL-decodes raw and returns the decoded form. On decode failure (e.g. a stray `%` not followed by hex) the input is returned unchanged.

Use NormalizeAnchor when comparing against CollectAnchors output — NormalizeAnchor combines DecodeAnchor with Slugify so callers see one normalised form. DecodeAnchor is exposed for code paths that store the decoded anchor as a distinct field from the slugified one (the LSP locator), where the slugify step happens later.

func ExpandCatalog added in v0.15.0

func ExpandCatalog(globs, files []string) []string

ExpandCatalog returns the subset of files that match any of the given glob patterns. Patterns prefixed with `!` are exclusion patterns — see globpath.MatchAny for the precise semantics.

The function does not walk the filesystem; the caller is responsible for supplying the candidate file list (typically the workspace-relative paths the discovery layer produced). Order in the returned slice matches the order in files.

func NormalizeAnchor

func NormalizeAnchor(raw string) string

NormalizeAnchor URL-decodes raw and slugifies it so the result can be compared against CollectAnchors output.

func ResolveRelTarget added in v0.15.0

func ResolveRelTarget(srcFile, linkPath string) string

ResolveRelTarget joins srcFile's directory with linkPath and returns the workspace-relative result. Absolute paths and ones that escape the workspace root after normalization return the empty string — callers must treat "" as "no in-workspace target" rather than as a valid path.

The function is strict about its inputs:

  • srcFile must already be workspace-relative (no leading `/`, no drive letter, no UNC `\\` prefix). Callers that hold absolute paths must convert them first; otherwise a `../../etc/passwd`-style linkPath could escape via path.Join's absolute-path semantics.
  • linkPath has both `\` and `/` translated to `/` before joining so a Windows-authored `sub\x.md` resolves the same way on Linux. (filepath.ToSlash is OS-dependent and a no-op on POSIX hosts; this helper translates explicitly via strings.ReplaceAll.)
  • Absolute inputs are rejected up-front; path.Join of two relative paths never produces an absolute result, so the only escape vector is a leading `../` in the cleaned output (caught below).
func ResolveWikiLink(root fs.FS, from, target string) (string, bool)

ResolveWikiLink resolves an Obsidian-style wikilink target against root, returning the workspace-relative path of the resolved file.

Resolution rules:

  • When target has no extension or ends in `.md`/`.markdown`, the search matches files whose stem (filename minus extension) equals target, case-insensitive. The target itself is also considered a stem when it lacks an extension.
  • When target has any other extension (an embed like `image.png`), the search matches files by exact filename, case-insensitive.
  • Ties are broken by the shortest path (fewest separators); then alphabetically. Two matches at the same depth never both win.
  • The walk is sandboxed to root: paths that would escape via `..` are rejected before the walk starts.

from is the workspace-relative path of the source file. It is reserved for future per-directory resolution preference; today it only blocks empty targets the same way `ParseTarget` does for regular links.

Types

type DirectiveEdge added in v0.15.0

type DirectiveEdge struct {
	Line  int
	Col   int
	Kind  DirectiveKind
	Path  string
	Globs []string
}

DirectiveEdge is one directive's parsed target.

Line and Col are body-relative (post front-matter strip) — same convention as Link.Line/Column. Callers needing file-relative coordinates must add f.LineOffset themselves.

For DirectiveInclude and DirectiveBuild, Path carries the raw directive value (file: for include, source: for build) verbatim from the directive body. Path is the un-resolved string — callers resolve it against the host file's directory using ResolveRelTarget.

For DirectiveCatalog, Globs carries the raw glob pattern list. Path is empty. The IsUnresolved method returns true for catalog edges so reverse-edge queries skip them generically — see the index layer for the corresponding Unresolved flag.

func ExtractDirectives added in v0.15.0

func ExtractDirectives(f *lint.File) []DirectiveEdge

ExtractDirectives walks f.AST top-level for processing-instruction nodes whose name is "include", "build", or "catalog", parses each one's YAML body, and returns one DirectiveEdge per directive that carries a usable target. Directives with malformed YAML or empty required parameters are skipped silently — the dedicated lint rules surface those as diagnostics; this extractor only contributes to the link graph.

Like ExtractLinks, ExtractDirectives is pure given its input: it does no file reads, no workspace traversal, and no global state mutation, so callers can invoke it concurrently across files.

func (DirectiveEdge) IsUnresolved added in v0.15.0

func (d DirectiveEdge) IsUnresolved() bool

IsUnresolved reports whether this directive points at glob patterns that need workspace-list expansion before they identify concrete files. True for DirectiveCatalog, false otherwise.

type DirectiveKind added in v0.15.0

type DirectiveKind int

DirectiveKind enumerates the directive shapes ExtractDirectives recognises.

const (
	// DirectiveInclude is a `<?include file: …?>` directive.
	DirectiveInclude DirectiveKind = iota
	// DirectiveBuild is a `<?build source: …?>` directive.
	DirectiveBuild
	// DirectiveCatalog is a `<?catalog glob: …?>` directive. Catalog
	// targets are glob patterns; concrete files are produced by
	// ExpandCatalog against a workspace file list.
	DirectiveCatalog
)
type Link struct {
	Line   int
	Column int
	Text   string
	Target Target
}

Link is one parsed Markdown link occurrence in a source file.

Reference-style links (`[text][label]`) are intentionally omitted from ExtractLinks results because their destinations resolve through the link-reference map rather than a URL; the link-graph builder only sees direct destinations.

Line is body-relative — counted from the start of the parsed body, not the original file. Lint rules return body-relative diagnostics because the engine applies f.LineOffset for front-matter adjustment. CLI callers (like `mdsmith list backlinks`) that want file-relative line numbers must add f.LineOffset themselves.

func ExtractImages added in v0.21.0

func ExtractImages(f *lint.File) []Link

ExtractImages walks f.AST and returns every Markdown image in document order. Both inline (Reference == nil) and reference-style (Reference != nil) images are included when their destination can be parsed as a local target. Lines are body-relative — same convention as Link.

func ExtractLinks(f *lint.File) []Link

ExtractLinks walks f.AST and returns every regular Markdown link in document order. Lines are body-relative (post front-matter strip); see the Link doc for why.

func ExtractRefLinkTargets added in v0.21.0

func ExtractRefLinkTargets(f *lint.File) []Link

ExtractRefLinkTargets walks f.AST and returns every reference-style link whose definition has been resolved by the parser, as Link values with the resolved destination ready for the same file-existence resolver that ExtractLinks feeds. Images are not included — those come from ExtractImages. Lines are body-relative — same convention as Link.

type RefLink struct {
	Line   int
	Column int
	Text   string
	// Label is the link-reference label, normalised via
	// util.ToLinkReference (lower-cased, internal whitespace
	// collapsed). Use this when keying into the parser-context ref
	// table or matching against a `[label]: url` definition.
	Label string
}

RefLink is one reference-style link use (`[text][label]`, `[text][]`, or `[label]`).

ExtractLinks skips these because reference-style destinations resolve through the link reference map at render time rather than via a URL, so callers that need to map "what file does this link point at" handle them separately (e.g. via the link-ref definition table in parser.Context).

Line and Column are body-relative — same convention as Link.

func ExtractRefLinks(f *lint.File) []RefLink

ExtractRefLinks walks f.AST and returns every reference-style link in document order. Inline links (`[text](url)`) are intentionally excluded — those come from ExtractLinks.

type Target

type Target struct {
	Raw         string
	Path        string
	Anchor      string
	LocalAnchor bool
}

Target is the parsed shape of a link destination URL.

Raw is the original destination string as it appeared in the source. Path and Anchor are the decoded path and fragment components — both are populated from url.URL, which percent-decodes them on parse. LocalAnchor is true when the destination was an anchor-only reference (e.g. `#section`).

Anchor matching against CollectAnchors output must still go through NormalizeAnchor: that runs Slugify (and a defensive PathUnescape) to produce the same form CollectAnchors stores.

func ParseTarget

func ParseTarget(dest string) (Target, bool)

ParseTarget parses a Markdown link destination into a Target. Returns ok=false when the destination is empty, has a scheme or host (treated as external), or has neither a path nor a fragment.

type WikiLink struct {
	Target string
	Anchor string
	Alias  string
	Embed  bool
	Line   int
	Column int
}

WikiLink is one parsed Obsidian-style wikilink occurrence.

Target is the destination filename or stem (without alias or anchor). Anchor and Alias are the optional fragment and display label. Embed reports whether the source used `![[...]]` rather than `[[...]]`.

Line and Column are body-relative — same convention as Link.

func ExtractWikiLinks(f *lint.File) []WikiLink

ExtractWikiLinks scans f.Source for Obsidian-style wikilinks (`[[Page]]`, `[[Page#anchor]]`, `[[Page|alias]]`, `![[file.png]]`). Matches inside fenced/indented code blocks, code spans, and `<?...?>` processing-instruction blocks are skipped — the same guards MDS054 applies to its bracket scanner.

Lines are body-relative (post front-matter strip). Returns nil for files without a parsed AST (struct-literal *lint.File instances): the code-block / code-span guards below walk the tree, so a missing AST would otherwise panic.

type WikilinkIndex added in v0.24.0

type WikilinkIndex struct {
	// contains filtered or unexported fields
}

WikilinkIndex is a pre-built directory of every file under one workspace root, keyed for the two lookup shapes ResolveWikiLink uses: stem (.md/.markdown filename minus extension) and exact filename. Each key holds the matching paths in shortest-then- alphabetical order, the same order ResolveWikiLink would otherwise sort on every call.

Build the index once per (run, root) — e.g. via `lint.RunCache.Wikilinks` — and call Resolve for each wikilink target. Lookups are then O(stems + matches) instead of O(files in workspace) per target.

func NewWikilinkIndex added in v0.24.0

func NewWikilinkIndex(root fs.FS) *WikilinkIndex

NewWikilinkIndex walks root once and returns a lookup table that future ResolveWikiLink-style queries can serve from memory. Returns nil when root is nil or the workspace walk itself fails (e.g. Open(".") on root returns an error). A nil return lets the caller fall back to per-call walks via ResolveWikiLink rather than serving an empty index that would silently report every target as "not found".

func WikilinkIndexFor added in v0.24.0

func WikilinkIndexFor(cache *lint.RunCache, rootKey string, root fs.FS) *WikilinkIndex

WikilinkIndexFor returns a *WikilinkIndex for root, memoized on cache under rootKey when cache is non-nil. With cache=nil the helper falls through to a direct NewWikilinkIndex call — callers without a long-lived cache (one-shot CLI commands) still share the same API.

This is the canonical entry point both MDS027 and `mdsmith list backlinks` route through; rewriting it once keeps the workspace walk semantics in one place.

func (*WikilinkIndex) Resolve added in v0.24.0

func (idx *WikilinkIndex) Resolve(target string) (string, bool)

Resolve answers the same question as ResolveWikiLink but serves it from the prebuilt index — no filesystem walk per call.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL