Documentation
¶
Overview ¶
Package webfetch provides web fetch and search capabilities via multiple providers.
Index ¶
- Constants
- func CollyCollectorWithLimits(domains string, delay, randomDelay time.Duration, parallelism int) *colly.Collector
- func FormatGitHubResult(result *GitHubResult) string
- func FormatResults(results []SearchResult) string
- func Register(e *sdk.Extension)
- type AgentBrowserProvider
- type BraveConfig
- type BraveProvider
- type Client
- type CollyProvider
- type Config
- type DuckDuckGoConfig
- type ExaConfig
- type ExaProvider
- type FetchProvider
- type GeminiConfig
- type GeminiProvider
- type GitHubClient
- type GitHubConfig
- type GitHubResult
- type HTTPError
- type JinaConfig
- type JinaProvider
- type PerplexityConfig
- type PerplexityProvider
- type RodProvider
- type SearchProvider
- type SearchResult
- type Storage
Constants ¶
const PerplexityRateLimitInterval = 6 * time.Second
PerplexityRateLimitInterval is the minimum time between Perplexity API calls (10 req/min).
Variables ¶
This section is empty.
Functions ¶
func CollyCollectorWithLimits ¶
func CollyCollectorWithLimits(domains string, delay, randomDelay time.Duration, parallelism int) *colly.Collector
collyRateLimiter returns a collector with rate limiting for batch use. Not used in the single-fetch provider path but exported for callers doing multi-page crawls.
func FormatGitHubResult ¶
func FormatGitHubResult(result *GitHubResult) string
FormatGitHubResult renders a GitHubResult as markdown.
func FormatResults ¶
func FormatResults(results []SearchResult) string
FormatResults renders a slice of SearchResults as a markdown list.
Types ¶
type AgentBrowserProvider ¶
type AgentBrowserProvider struct{}
AgentBrowserProvider fetches pages by shelling out to the agent-browser CLI. Handles Cloudflare, complex JS, interactive pages. Heaviest option.
func NewAgentBrowserProvider ¶
func NewAgentBrowserProvider() *AgentBrowserProvider
func (*AgentBrowserProvider) Name ¶
func (p *AgentBrowserProvider) Name() string
type BraveConfig ¶
type BraveConfig struct {
SearchURL string `yaml:"search_url"`
}
BraveConfig holds Brave API endpoint settings.
func DefaultBraveConfig ¶
func DefaultBraveConfig() BraveConfig
DefaultBraveConfig returns the default Brave configuration.
type BraveProvider ¶
type BraveProvider struct {
// contains filtered or unexported fields
}
BraveProvider implements SearchProvider using the Brave Search API. Brave is search-only — no fetch/reader capability.
func NewBraveProvider ¶
func NewBraveProvider(apiKey string, cfg BraveConfig) *BraveProvider
NewBraveProvider creates a BraveProvider with the given API key and endpoint config. Returns nil if apiKey is empty.
func (*BraveProvider) Name ¶
func (b *BraveProvider) Name() string
func (*BraveProvider) Search ¶
func (b *BraveProvider) Search(ctx context.Context, query string, limit int) ([]SearchResult, error)
type Client ¶
type Client struct {
// contains filtered or unexported fields
}
Client performs web fetch and search operations with fallback support. The zero value is not usable; use New, NewWithConfig, Default, or NewForTest.
func Default ¶
func Default() *Client
Default returns a Client with only Jina provider (no API keys required).
func New ¶
func New(fetchProviders []FetchProvider, searchProviders []SearchProvider, github *GitHubClient) *Client
New creates a Client with custom provider lists (used for testing).
func NewForTest ¶
NewForTest creates a Client with mock Jina providers for testing.
func NewWithConfig ¶
NewWithConfig creates a Client with providers based on the given config. Fetch priority: Colly → Jina → Rod → agent-browser → Gemini → Perplexity. Search priority: Brave → Exa → Gemini → Perplexity → Jina → DuckDuckGo.
func (*Client) Fetch ¶
Fetch retrieves content from the given URL with provider fallback. If raw is false, content is fetched via reader providers (returns clean markdown). If raw is true, the URL is fetched directly.
func (*Client) GetStorage ¶
GetStorage returns the session storage for cached results.
type CollyProvider ¶
type CollyProvider struct{}
CollyProvider fetches static HTML pages locally using Colly. Fast and free — no API key, no external service. Falls through if the page content is too short (likely JS-rendered).
func NewCollyProvider ¶
func NewCollyProvider() *CollyProvider
func (*CollyProvider) Name ¶
func (p *CollyProvider) Name() string
type Config ¶
type Config struct {
JinaAPIKey string `yaml:"jina_api_key"`
PerplexityAPIKey string `yaml:"perplexity_api_key"`
GeminiAPIKey string `yaml:"gemini_api_key"`
BraveAPIKey string `yaml:"brave_api_key"`
ExaAPIKey string `yaml:"exa_api_key"`
GitHub GitHubConfig `yaml:"github"`
Gemini GeminiConfig `yaml:"gemini_config"`
Perplexity PerplexityConfig `yaml:"perplexity_config"`
Exa ExaConfig `yaml:"exa_config"`
Jina JinaConfig `yaml:"jina_config"`
Brave BraveConfig `yaml:"brave_config"`
DuckDuckGo DuckDuckGoConfig `yaml:"duckduckgo_config"`
}
Config holds configuration for webfetch providers.
func LoadConfig ¶
LoadConfig reads configuration from the namespaced extension directory (~/.config/piglet/extensions/webfetch/webfetch.yaml), falling back to the flat location (~/.config/piglet/webfetch.yaml) for backward compatibility. If neither exists, it creates one with default values in the namespaced directory.
type DuckDuckGoConfig ¶
type DuckDuckGoConfig struct {
SearchURL string `yaml:"search_url"`
}
DuckDuckGoConfig holds DuckDuckGo search endpoint settings.
func DefaultDuckDuckGoConfig ¶
func DefaultDuckDuckGoConfig() DuckDuckGoConfig
DefaultDuckDuckGoConfig returns the default DuckDuckGo configuration.
type ExaConfig ¶
type ExaConfig struct {
SearchURL string `yaml:"search_url"`
ContentsURL string `yaml:"contents_url"`
}
ExaConfig holds Exa API endpoint settings.
func DefaultExaConfig ¶
func DefaultExaConfig() ExaConfig
DefaultExaConfig returns the default Exa configuration.
type ExaProvider ¶
type ExaProvider struct {
// contains filtered or unexported fields
}
ExaProvider implements SearchProvider and FetchProvider using the Exa API.
func NewExaProvider ¶
func NewExaProvider(apiKey string, cfg ExaConfig) *ExaProvider
NewExaProvider creates an ExaProvider with the given API key and endpoint config. Returns nil if apiKey is empty.
func (*ExaProvider) Name ¶
func (e *ExaProvider) Name() string
func (*ExaProvider) Search ¶
func (e *ExaProvider) Search(ctx context.Context, query string, limit int) ([]SearchResult, error)
type FetchProvider ¶
type FetchProvider interface {
Name() string
Fetch(ctx context.Context, rawURL string) (string, error)
}
FetchProvider defines the interface for content fetching.
type GeminiConfig ¶
GeminiConfig holds Gemini API endpoint settings.
func DefaultGeminiConfig ¶
func DefaultGeminiConfig() GeminiConfig
DefaultGeminiConfig returns the default Gemini configuration.
type GeminiProvider ¶
type GeminiProvider struct {
// contains filtered or unexported fields
}
GeminiProvider implements FetchProvider and SearchProvider using Google Gemini API.
func NewGeminiProvider ¶
func NewGeminiProvider(apiKey string, cfg GeminiConfig) *GeminiProvider
NewGeminiProvider creates a GeminiProvider with the given API key and endpoint config. Returns nil if apiKey is empty.
func (*GeminiProvider) Name ¶
func (g *GeminiProvider) Name() string
Name returns the provider name for logging.
func (*GeminiProvider) Search ¶
func (g *GeminiProvider) Search(ctx context.Context, query string, limit int) ([]SearchResult, error)
Search queries Gemini for search results.
type GitHubClient ¶
type GitHubClient struct {
// contains filtered or unexported fields
}
GitHubClient handles GitHub repo cloning and API fallback.
func NewGitHubClient ¶
func NewGitHubClient(cfg *GitHubConfig) *GitHubClient
NewGitHubClient creates a new GitHub client.
func (*GitHubClient) Fetch ¶
func (g *GitHubClient) Fetch(ctx context.Context, rawURL string) (*GitHubResult, error)
Fetch retrieves content from a GitHub URL. Returns nil if the URL is not a GitHub URL (caller should try other providers).
type GitHubConfig ¶
type GitHubConfig struct {
Enabled bool `yaml:"enabled"`
SkipLargeRepos bool `yaml:"skip_large_repos"`
}
GitHubConfig configures GitHub clone behavior.
type GitHubResult ¶
type GitHubResult struct {
LocalPath string `json:"local_path"`
README string `json:"readme"`
Tree []string `json:"tree"`
UsedAPI bool `json:"used_api"`
}
GitHubResult holds the result of fetching a GitHub repo.
type JinaConfig ¶
type JinaConfig struct {
ReaderBase string `yaml:"reader_base"`
SearchBase string `yaml:"search_base"`
}
JinaConfig holds Jina API endpoint settings.
func DefaultJinaConfig ¶
func DefaultJinaConfig() JinaConfig
DefaultJinaConfig returns the default Jina configuration.
type JinaProvider ¶
type JinaProvider struct {
// contains filtered or unexported fields
}
JinaProvider implements FetchProvider and SearchProvider using Jina AI readers.
func NewJinaProvider ¶
func NewJinaProvider(apiKey string, cfg JinaConfig) *JinaProvider
NewJinaProvider creates a JinaProvider with the given API key and endpoint config.
func NewJinaProviderWithBase ¶
func NewJinaProviderWithBase(readerBase, searchBase string, apiKey string) *JinaProvider
NewJinaProviderWithBase creates a JinaProvider with custom base URLs (for testing).
func (*JinaProvider) Name ¶
func (j *JinaProvider) Name() string
Name returns the provider name for logging.
func (*JinaProvider) Search ¶
func (j *JinaProvider) Search(ctx context.Context, query string, limit int) ([]SearchResult, error)
Search queries the Jina search endpoint.
type PerplexityConfig ¶
PerplexityConfig holds Perplexity API endpoint settings.
func DefaultPerplexityConfig ¶
func DefaultPerplexityConfig() PerplexityConfig
DefaultPerplexityConfig returns the default Perplexity configuration.
type PerplexityProvider ¶
type PerplexityProvider struct {
// contains filtered or unexported fields
}
PerplexityProvider implements FetchProvider and SearchProvider using Perplexity API.
func NewPerplexityProvider ¶
func NewPerplexityProvider(apiKey string, cfg PerplexityConfig) *PerplexityProvider
NewPerplexityProvider creates a PerplexityProvider with the given API key and endpoint config. Returns nil if apiKey is empty.
func (*PerplexityProvider) Fetch ¶
Fetch retrieves content by asking Perplexity to summarize the URL.
func (*PerplexityProvider) Name ¶
func (p *PerplexityProvider) Name() string
Name returns the provider name for logging.
func (*PerplexityProvider) Search ¶
func (p *PerplexityProvider) Search(ctx context.Context, query string, limit int) ([]SearchResult, error)
Search queries Perplexity for search results.
type RodProvider ¶
type RodProvider struct {
// contains filtered or unexported fields
}
RodProvider fetches JS-rendered pages using headless Chrome via CDP. Heavier than Colly but handles SPAs, dynamic content, and complex pages.
func NewRodProvider ¶
func NewRodProvider() *RodProvider
func (*RodProvider) Name ¶
func (p *RodProvider) Name() string
type SearchProvider ¶
type SearchProvider interface {
Name() string
Search(ctx context.Context, query string, limit int) ([]SearchResult, error)
}
SearchProvider defines the interface for web search.
type SearchResult ¶
type SearchResult struct {
Title string `json:"title"`
URL string `json:"url"`
Description string `json:"description"`
}
SearchResult holds a single search result.
type Storage ¶
type Storage struct {
// contains filtered or unexported fields
}
Storage holds cached fetch/search results for the session.
func (*Storage) GetFetch ¶
GetFetch retrieves cached fetch content. Returns empty string if not found.
func (*Storage) GetSearch ¶
func (s *Storage) GetSearch(query string) []SearchResult
GetSearch retrieves cached search results. Returns nil if not found.
func (*Storage) StoreFetch ¶
StoreFetch saves fetch content for later retrieval.
func (*Storage) StoreSearch ¶
func (s *Storage) StoreSearch(query string, results []SearchResult)
StoreSearch saves search results for later retrieval.