robots

package module

v1.0.0-rc.1 Latest Latest Go to latest Published: May 1, 2026 License: Apache-2.0 Imports: 2 Imported by: 1

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/MontFerret/contrib

Links

Open Source Insights

README ¶

WEB::ROBOTS Module

github.com/MontFerret/contrib/modules/web/robots registers robots.txt parsing and policy helpers under the WEB::ROBOTS namespace for Ferret hosts.

The module exposes these functions:

WEB::ROBOTS::PARSE
WEB::ROBOTS::ALLOWS
WEB::ROBOTS::MATCH
WEB::ROBOTS::SITEMAPS

Install

go get github.com/MontFerret/contrib/modules/web/robots

Register The Module

package main

import (
	"github.com/MontFerret/ferret/v2"

	robotsmodule "github.com/MontFerret/contrib/modules/web/robots"
)

func main() {
	robotsMod, err := robotsmodule.New()
	if err != nil {
		panic(err)
	}

	engine, err := ferret.New(
		ferret.WithModules(robotsMod),
	)
	if err != nil {
		panic(err)
	}

	_ = engine
}

Function Reference

Function	Signature	Returns	Notes
`WEB::ROBOTS::PARSE`	`WEB::ROBOTS::PARSE(text)`	`Object`	Parses raw robots.txt text into a plain Ferret object.
`WEB::ROBOTS::ALLOWS`	`WEB::ROBOTS::ALLOWS(robots, path, userAgent?)`	`Boolean`	Returns whether the path is allowed for the effective user-agent group.
`WEB::ROBOTS::MATCH`	`WEB::ROBOTS::MATCH(robots, path, userAgent?)`	`Object`	Returns rule-match details for debugging and inspection.
`WEB::ROBOTS::SITEMAPS`	`WEB::ROBOTS::SITEMAPS(robots)`	`String[]`	Returns top-level sitemap declarations from the robots document.

Return Shapes

WEB::ROBOTS::PARSE returns an object in this shape:

{
  "groups": [
    {
      "userAgents": ["*"],
      "allow": ["/public"],
      "disallow": ["/admin"],
      "crawlDelay": 5
    }
  ],
  "sitemaps": [
    "https://example.com/sitemap.xml"
  ],
  "host": null
}

WEB::ROBOTS::MATCH returns an object in this shape:

{
  "allowed": true,
  "directive": "allow",
  "pattern": "/products/",
  "userAgent": "FerretBot"
}

userAgent reports the normalized debug token used for evaluation. Exact matches, no-match default allows, and implicit /robots.txt allows return the normalized requested token. When evaluation falls back to wildcard groups, it is returned as "*". The field does not preserve the original User-agent: casing from the robots file. When access is allowed by default with no matching rule, directive and pattern are returned as null.

Examples

Parse A robots.txt Document

LET robots = WEB::ROBOTS::PARSE($text)
RETURN robots.sitemaps

Check Path Access

LET robots = WEB::ROBOTS::PARSE($text)
RETURN WEB::ROBOTS::ALLOWS(robots, "/admin/users", "FerretBot")

Inspect The Matching Rule

LET robots = WEB::ROBOTS::PARSE($text)
RETURN WEB::ROBOTS::MATCH(robots, "/catalog/item/1", "FerretBot")

Return Declared Sitemap URLs

LET robots = WEB::ROBOTS::PARSE($text)
FOR sitemap IN WEB::ROBOTS::SITEMAPS(robots)
  RETURN sitemap

Behavior Notes

User-agent matching is case-insensitive and exact against the supplied crawler product token.
If no exact user-agent group matches, * groups are used when present.
Matching supports * wildcards and trailing $ end anchors.
The most specific rule wins; when equal Allow and Disallow rules both match, Allow wins.
If no rule matches, the path is allowed.
/robots.txt is always allowed.
PARSE preserves group order and per-directive rule order.
host is exposed for transparency only and does not affect matching.

Documentation ¶

Overview ¶

Package robots contains WEB::ROBOTS helpers and internals.

Index ¶

func New() module.Module

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func New ¶

func New() module.Module

New returns the WEB::ROBOTS module, which registers the WEB::ROBOTS namespace functions on a Ferret host during bootstrap.

Types ¶

This section is empty.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
core Package core implements WEB::ROBOTS parsing and matching logic.	Package core implements WEB::ROBOTS parsing and matching logic.
lib Package lib exposes the Ferret-facing WEB::ROBOTS functions.	Package lib exposes the Ferret-facing WEB::ROBOTS functions.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL