webfetch

package
v0.23.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 24, 2026 License: MIT Imports: 25 Imported by: 0

Documentation

Overview

Package webfetch provides web fetch and search capabilities via multiple providers.

Index

Constants

View Source
const PerplexityRateLimitInterval = 6 * time.Second

PerplexityRateLimitInterval is the minimum time between Perplexity API calls (10 req/min).

Variables

This section is empty.

Functions

func CollyCollectorWithLimits

func CollyCollectorWithLimits(domains string, delay, randomDelay time.Duration, parallelism int) *colly.Collector

collyRateLimiter returns a collector with rate limiting for batch use. Not used in the single-fetch provider path but exported for callers doing multi-page crawls.

func FormatGitHubResult

func FormatGitHubResult(result *GitHubResult) string

FormatGitHubResult renders a GitHubResult as markdown.

func FormatResults

func FormatResults(results []SearchResult) string

FormatResults renders a slice of SearchResults as a markdown list.

func Register

func Register(e *sdk.Extension)

Register registers the webfetch extension's tools and prompt section.

Types

type AgentBrowserProvider

type AgentBrowserProvider struct{}

AgentBrowserProvider fetches pages by shelling out to the agent-browser CLI. Handles Cloudflare, complex JS, interactive pages. Heaviest option.

func NewAgentBrowserProvider

func NewAgentBrowserProvider() *AgentBrowserProvider

func (*AgentBrowserProvider) Fetch

func (p *AgentBrowserProvider) Fetch(ctx context.Context, rawURL string) (string, error)

func (*AgentBrowserProvider) Name

func (p *AgentBrowserProvider) Name() string

type BraveConfig

type BraveConfig struct {
	SearchURL string `yaml:"search_url"`
}

BraveConfig holds Brave API endpoint settings.

func DefaultBraveConfig

func DefaultBraveConfig() BraveConfig

DefaultBraveConfig returns the default Brave configuration.

type BraveProvider

type BraveProvider struct {
	// contains filtered or unexported fields
}

BraveProvider implements SearchProvider using the Brave Search API. Brave is search-only — no fetch/reader capability.

func NewBraveProvider

func NewBraveProvider(apiKey string, cfg BraveConfig) *BraveProvider

NewBraveProvider creates a BraveProvider with the given API key and endpoint config. Returns nil if apiKey is empty.

func (*BraveProvider) Name

func (b *BraveProvider) Name() string

func (*BraveProvider) Search

func (b *BraveProvider) Search(ctx context.Context, query string, limit int) ([]SearchResult, error)

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client performs web fetch and search operations with fallback support. The zero value is not usable; use New, NewWithConfig, Default, or NewForTest.

func Default

func Default() *Client

Default returns a Client with only Jina provider (no API keys required).

func New

func New(fetchProviders []FetchProvider, searchProviders []SearchProvider, github *GitHubClient) *Client

New creates a Client with custom provider lists (used for testing).

func NewForTest

func NewForTest(readerBase, searchBase string) *Client

NewForTest creates a Client with mock Jina providers for testing.

func NewWithConfig

func NewWithConfig(cfg *Config) *Client

NewWithConfig creates a Client with providers based on the given config. Fetch priority: Colly → Jina → Rod → agent-browser → Gemini → Perplexity. Search priority: Brave → Exa → Gemini → Perplexity → Jina → DuckDuckGo.

func (*Client) Fetch

func (c *Client) Fetch(ctx context.Context, rawURL string, raw bool) (string, error)

Fetch retrieves content from the given URL with provider fallback. If raw is false, content is fetched via reader providers (returns clean markdown). If raw is true, the URL is fetched directly.

func (*Client) GetStorage

func (c *Client) GetStorage() *Storage

GetStorage returns the session storage for cached results.

func (*Client) Search

func (c *Client) Search(ctx context.Context, query string, limit int) ([]SearchResult, error)

Search queries providers and returns up to limit results with fallback.

type CollyProvider

type CollyProvider struct{}

CollyProvider fetches static HTML pages locally using Colly. Fast and free — no API key, no external service. Falls through if the page content is too short (likely JS-rendered).

func NewCollyProvider

func NewCollyProvider() *CollyProvider

func (*CollyProvider) Fetch

func (p *CollyProvider) Fetch(ctx context.Context, rawURL string) (string, error)

func (*CollyProvider) Name

func (p *CollyProvider) Name() string

type Config

type Config struct {
	JinaAPIKey       string           `yaml:"jina_api_key"`
	PerplexityAPIKey string           `yaml:"perplexity_api_key"`
	GeminiAPIKey     string           `yaml:"gemini_api_key"`
	BraveAPIKey      string           `yaml:"brave_api_key"`
	ExaAPIKey        string           `yaml:"exa_api_key"`
	GitHub           GitHubConfig     `yaml:"github"`
	Gemini           GeminiConfig     `yaml:"gemini_config"`
	Perplexity       PerplexityConfig `yaml:"perplexity_config"`
	Exa              ExaConfig        `yaml:"exa_config"`
	Jina             JinaConfig       `yaml:"jina_config"`
	Brave            BraveConfig      `yaml:"brave_config"`
	DuckDuckGo       DuckDuckGoConfig `yaml:"duckduckgo_config"`
}

Config holds configuration for webfetch providers.

func LoadConfig

func LoadConfig() (*Config, error)

LoadConfig reads configuration from the namespaced extension directory (~/.config/piglet/extensions/webfetch/webfetch.yaml), falling back to the flat location (~/.config/piglet/webfetch.yaml) for backward compatibility. If neither exists, it creates one with default values in the namespaced directory.

type DuckDuckGoConfig

type DuckDuckGoConfig struct {
	SearchURL string `yaml:"search_url"`
}

DuckDuckGoConfig holds DuckDuckGo search endpoint settings.

func DefaultDuckDuckGoConfig

func DefaultDuckDuckGoConfig() DuckDuckGoConfig

DefaultDuckDuckGoConfig returns the default DuckDuckGo configuration.

type ExaConfig

type ExaConfig struct {
	SearchURL   string `yaml:"search_url"`
	ContentsURL string `yaml:"contents_url"`
}

ExaConfig holds Exa API endpoint settings.

func DefaultExaConfig

func DefaultExaConfig() ExaConfig

DefaultExaConfig returns the default Exa configuration.

type ExaProvider

type ExaProvider struct {
	// contains filtered or unexported fields
}

ExaProvider implements SearchProvider and FetchProvider using the Exa API.

func NewExaProvider

func NewExaProvider(apiKey string, cfg ExaConfig) *ExaProvider

NewExaProvider creates an ExaProvider with the given API key and endpoint config. Returns nil if apiKey is empty.

func (*ExaProvider) Fetch

func (e *ExaProvider) Fetch(ctx context.Context, rawURL string) (string, error)

func (*ExaProvider) Name

func (e *ExaProvider) Name() string

func (*ExaProvider) Search

func (e *ExaProvider) Search(ctx context.Context, query string, limit int) ([]SearchResult, error)

type FetchProvider

type FetchProvider interface {
	Name() string
	Fetch(ctx context.Context, rawURL string) (string, error)
}

FetchProvider defines the interface for content fetching.

type GeminiConfig

type GeminiConfig struct {
	APIBase string `yaml:"api_base"`
	Model   string `yaml:"model"`
}

GeminiConfig holds Gemini API endpoint settings.

func DefaultGeminiConfig

func DefaultGeminiConfig() GeminiConfig

DefaultGeminiConfig returns the default Gemini configuration.

type GeminiProvider

type GeminiProvider struct {
	// contains filtered or unexported fields
}

GeminiProvider implements FetchProvider and SearchProvider using Google Gemini API.

func NewGeminiProvider

func NewGeminiProvider(apiKey string, cfg GeminiConfig) *GeminiProvider

NewGeminiProvider creates a GeminiProvider with the given API key and endpoint config. Returns nil if apiKey is empty.

func (*GeminiProvider) Fetch

func (g *GeminiProvider) Fetch(ctx context.Context, rawURL string) (string, error)

Fetch retrieves content by asking Gemini to summarize the URL.

func (*GeminiProvider) Name

func (g *GeminiProvider) Name() string

Name returns the provider name for logging.

func (*GeminiProvider) Search

func (g *GeminiProvider) Search(ctx context.Context, query string, limit int) ([]SearchResult, error)

Search queries Gemini for search results.

type GitHubClient

type GitHubClient struct {
	// contains filtered or unexported fields
}

GitHubClient handles GitHub repo cloning and API fallback.

func NewGitHubClient

func NewGitHubClient(cfg *GitHubConfig) *GitHubClient

NewGitHubClient creates a new GitHub client.

func (*GitHubClient) Fetch

func (g *GitHubClient) Fetch(ctx context.Context, rawURL string) (*GitHubResult, error)

Fetch retrieves content from a GitHub URL. Returns nil if the URL is not a GitHub URL (caller should try other providers).

type GitHubConfig

type GitHubConfig struct {
	Enabled        bool `yaml:"enabled"`
	SkipLargeRepos bool `yaml:"skip_large_repos"`
}

GitHubConfig configures GitHub clone behavior.

type GitHubResult

type GitHubResult struct {
	LocalPath string   `json:"local_path"`
	README    string   `json:"readme"`
	Tree      []string `json:"tree"`
	UsedAPI   bool     `json:"used_api"`
}

GitHubResult holds the result of fetching a GitHub repo.

type HTTPError

type HTTPError struct {
	URL        string
	StatusCode int
	Err        error
}

HTTPError represents an HTTP error with status code and URL.

func (*HTTPError) Error

func (e *HTTPError) Error() string

Error implements the error interface.

func (*HTTPError) Unwrap

func (e *HTTPError) Unwrap() error

Unwrap returns the underlying error.

type JinaConfig

type JinaConfig struct {
	ReaderBase string `yaml:"reader_base"`
	SearchBase string `yaml:"search_base"`
}

JinaConfig holds Jina API endpoint settings.

func DefaultJinaConfig

func DefaultJinaConfig() JinaConfig

DefaultJinaConfig returns the default Jina configuration.

type JinaProvider

type JinaProvider struct {
	// contains filtered or unexported fields
}

JinaProvider implements FetchProvider and SearchProvider using Jina AI readers.

func NewJinaProvider

func NewJinaProvider(apiKey string, cfg JinaConfig) *JinaProvider

NewJinaProvider creates a JinaProvider with the given API key and endpoint config.

func NewJinaProviderWithBase

func NewJinaProviderWithBase(readerBase, searchBase string, apiKey string) *JinaProvider

NewJinaProviderWithBase creates a JinaProvider with custom base URLs (for testing).

func (*JinaProvider) Fetch

func (j *JinaProvider) Fetch(ctx context.Context, rawURL string) (string, error)

Fetch retrieves content from the given URL via the Jina reader.

func (*JinaProvider) Name

func (j *JinaProvider) Name() string

Name returns the provider name for logging.

func (*JinaProvider) Search

func (j *JinaProvider) Search(ctx context.Context, query string, limit int) ([]SearchResult, error)

Search queries the Jina search endpoint.

type PerplexityConfig

type PerplexityConfig struct {
	APIURL string `yaml:"api_url"`
	Model  string `yaml:"model"`
}

PerplexityConfig holds Perplexity API endpoint settings.

func DefaultPerplexityConfig

func DefaultPerplexityConfig() PerplexityConfig

DefaultPerplexityConfig returns the default Perplexity configuration.

type PerplexityProvider

type PerplexityProvider struct {
	// contains filtered or unexported fields
}

PerplexityProvider implements FetchProvider and SearchProvider using Perplexity API.

func NewPerplexityProvider

func NewPerplexityProvider(apiKey string, cfg PerplexityConfig) *PerplexityProvider

NewPerplexityProvider creates a PerplexityProvider with the given API key and endpoint config. Returns nil if apiKey is empty.

func (*PerplexityProvider) Fetch

func (p *PerplexityProvider) Fetch(ctx context.Context, rawURL string) (string, error)

Fetch retrieves content by asking Perplexity to summarize the URL.

func (*PerplexityProvider) Name

func (p *PerplexityProvider) Name() string

Name returns the provider name for logging.

func (*PerplexityProvider) Search

func (p *PerplexityProvider) Search(ctx context.Context, query string, limit int) ([]SearchResult, error)

Search queries Perplexity for search results.

type RodProvider

type RodProvider struct {
	// contains filtered or unexported fields
}

RodProvider fetches JS-rendered pages using headless Chrome via CDP. Heavier than Colly but handles SPAs, dynamic content, and complex pages.

func NewRodProvider

func NewRodProvider() *RodProvider

func (*RodProvider) Close

func (p *RodProvider) Close()

Close shuts down the browser if running.

func (*RodProvider) Fetch

func (p *RodProvider) Fetch(ctx context.Context, rawURL string) (string, error)

func (*RodProvider) Name

func (p *RodProvider) Name() string

type SearchProvider

type SearchProvider interface {
	Name() string
	Search(ctx context.Context, query string, limit int) ([]SearchResult, error)
}

SearchProvider defines the interface for web search.

type SearchResult

type SearchResult struct {
	Title       string `json:"title"`
	URL         string `json:"url"`
	Description string `json:"description"`
}

SearchResult holds a single search result.

type Storage

type Storage struct {
	// contains filtered or unexported fields
}

Storage holds cached fetch/search results for the session.

func NewStorage

func NewStorage() *Storage

NewStorage creates a new session storage.

func (*Storage) GetFetch

func (s *Storage) GetFetch(url string) string

GetFetch retrieves cached fetch content. Returns empty string if not found.

func (*Storage) GetSearch

func (s *Storage) GetSearch(query string) []SearchResult

GetSearch retrieves cached search results. Returns nil if not found.

func (*Storage) List

func (s *Storage) List() (urls, queries []string)

List returns all stored URLs and queries.

func (*Storage) StoreFetch

func (s *Storage) StoreFetch(url, content string)

StoreFetch saves fetch content for later retrieval.

func (*Storage) StoreSearch

func (s *Storage) StoreSearch(query string, results []SearchResult)

StoreSearch saves search results for later retrieval.

Directories

Path Synopsis
Webfetch extension binary.
Webfetch extension binary.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL