webfetch

package module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 16, 2026 License: Apache-2.0 Imports: 20 Imported by: 0

README

webfetch

Fetch web pages and extract their text — with optional LLM-based indirect prompt-injection analysis — for the gollem LLM agent framework.

github.com/gollem-dev/tools/webfetch

Tools

Name Description
web_fetch Fetch a web page and extract its text, optionally with LLM-based indirect prompt-injection screening.

When an LLM client is supplied via WithLLMClient, fetched content is screened for indirect prompt-injection attempts before being returned to the agent. Without it, the page text is returned as-is.

SSRF guard

Because web_fetch may be handed URLs from untrusted sources (LLM output, chat messages, case data), the default HTTP client enforces an SSRF guard: a net.Dialer.Control hook inspects the already-resolved destination IP and rejects anything that is not a public, global-unicast address — loopback, RFC1918/ULA private ranges, CGNAT (100.64.0.0/10), link-local (including the 169.254.169.254 metadata endpoint), unspecified, and multicast. Inspecting the resolved IP defeats DNS rebinding and applies to every redirect hop.

The guard is enabled by default. Disable it with WithAllowPrivateIP(true) (e.g. to reach a loopback test server). When you inject your own client via WithHTTPClient, the guard is not installed — that client's transport is used as-is, so add your own dial control if needed.

Usage

ts, err := webfetch.New(
	webfetch.WithLLMClient(llm), // optional: enables injection screening
)
if err != nil {
	return err
}
if err := ts.Ping(ctx); err != nil { // optional preflight
	return err
}

Options

Option Required Default
WithLLMClient(gollem.LLMClient) no none (screening disabled)
WithMaxContentBytes(int64) no 10 MiB
WithAllowPrivateIP(bool) no false (SSRF guard enabled)
WithHTTPClient(*http.Client) no guarded built-in client
WithLogger(*slog.Logger) no slog.Default()

Testing

Mock tests run unconditionally. The live-service test runs only when TEST_WEBFETCH_URL is set; the injection-screening path additionally needs TEST_GEMINI_PROJECT_ID and TEST_GEMINI_LOCATION:

TEST_WEBFETCH_URL=https://example.com \
	TEST_GEMINI_PROJECT_ID=... TEST_GEMINI_LOCATION=us-central1 go test ./...

Documentation

Overview

Package webfetch provides a gollem.ToolSet that fetches a web page, extracts its text content, and optionally screens it for indirect prompt injection via an injected LLM client.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Option

type Option func(*ToolSet)

Option configures a ToolSet.

func WithAllowPrivateIP added in v0.2.0

func WithAllowPrivateIP(allow bool) Option

WithAllowPrivateIP controls the SSRF guard on the default HTTP client. The guard is enabled by default (allow == false): connections to non-public IPs (loopback, RFC1918/ULA private ranges, CGNAT, link-local metadata endpoints, etc.) are rejected at dial time on every redirect hop. Pass true to disable it, e.g. to reach a loopback test server. This has no effect when a client is injected via WithHTTPClient.

func WithHTTPClient

func WithHTTPClient(client *http.Client) Option

WithHTTPClient overrides the HTTP client used for requests. A nil value is ignored and the default client is kept.

An injected client carries its own transport, so the built-in SSRF guard (see WithAllowPrivateIP) is NOT installed on it. Supplying a client is the documented escape hatch for callers that need full control over dialing.

func WithLLMClient

func WithLLMClient(client gollem.LLMClient) Option

WithLLMClient injects an LLM client for the analyze step. When set, each web_fetch call passes the extracted text through an indirect-prompt-injection analysis session and returns the cleaned Markdown on success. When nil or not provided, the analyze step is disabled and the raw extracted text is returned verbatim.

func WithLogger

func WithLogger(logger *slog.Logger) Option

WithLogger sets the logger. A nil argument keeps the default (slog.Default()).

func WithMaxContentBytes

func WithMaxContentBytes(n int64) Option

WithMaxContentBytes sets the maximum number of bytes that will be read from an HTTP response body. Responses longer than this limit are truncated before extraction. The default is 10 MiB. A value <= 0 is ignored.

type ToolSet

type ToolSet struct {
	// contains filtered or unexported fields
}

ToolSet implements gollem.ToolSet for web-page fetching with optional LLM-based indirect prompt injection analysis. All fields are unexported; configure via Option.

func New

func New(opts ...Option) (*ToolSet, error)

New constructs a ToolSet. It performs only static validation; no network I/O is performed. Use Ping to verify connectivity.

func (*ToolSet) Ping

func (t *ToolSet) Ping(ctx context.Context) error

Ping checks whether the configured dependencies are reachable. If an LLM client is set, it creates a session and performs a trivial generate call to confirm the client is operational; if no client is set, Ping always returns nil (the tool still works in fetch-only mode).

func (*ToolSet) Run

func (t *ToolSet) Run(ctx context.Context, name string, args map[string]any) (map[string]any, error)

Run dispatches tool calls. Only "web_fetch" is supported.

func (*ToolSet) Specs

func (t *ToolSet) Specs(_ context.Context) ([]gollem.ToolSpec, error)

Specs returns the tool specifications exposed by this ToolSet.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL