Documentation
¶
Overview ¶
brave.go
Package metawebsearch provides a multi-engine web search library for Go.
It scrapes Google, DuckDuckGo, Brave, Mojeek, Yahoo, Yandex, Wikipedia, and Grokipedia behind a common EngineConfig interface. Engines can be used individually via Execute or concurrently via MultiSearch.
Browser impersonation (TLS + HTTP/2 fingerprinting) is handled by tls-client, wrapped behind the HTTPClient interface for testability.
duckduckgo.go
engine.go
extract.go
google.go
grokipedia.go
mojeek.go
multi.go
registry.go
result.go
wikipedia.go
yahoo.go
yandex.go
Index ¶
- Variables
- func CleanText(s string) string
- func UnwrapRedirect(href string, pattern RedirectPattern) string
- func XPathExtract(doc *html.Node, itemsXPath string, fields map[string]string) ([]map[string]string, error)
- type ClientOpts
- type EngineConfig
- type HTTPClient
- type MultiSearch
- type RedirectPattern
- type Result
- type SearchOpts
- type SearchResult
Constants ¶
This section is empty.
Variables ¶
var Brave = EngineConfig{ Name: "brave", MinDelay: 2 * time.Second, MaxRetries: 3, RetryableStatus: defaultRetryableStatus, BuildRequest: braveBuildRequest, ParseResponse: braveParseResponse, }
Brave is the EngineConfig for Brave web search. Ported from reference/ddgs/engines/brave.py.
var DuckDuckGo = EngineConfig{ Name: "duckduckgo", MinDelay: 2 * time.Second, MaxRetries: 3, RetryableStatus: defaultRetryableStatus, BuildRequest: ddgBuildRequest, ParseResponse: ddgParseResponse, PostProcess: ddgPostProcess, }
DuckDuckGo is the EngineConfig for DuckDuckGo web search. Ported from reference/ddgs/engines/duckduckgo.py.
var Google = EngineConfig{ Name: "google", ClientProfile: "safari_ios_26_0", MinDelay: 3 * time.Second, MaxRetries: 3, RetryableStatus: defaultRetryableStatus, BuildRequest: googleBuildRequest, ParseResponse: googleParseResponse, PostProcess: googlePostProcess, }
Google is the EngineConfig for Google web search.
var Grokipedia = EngineConfig{ Name: "grokipedia", MinDelay: 1 * time.Second, MaxRetries: 2, RetryableStatus: defaultRetryableStatus, BuildRequest: grokipediaBuildRequest, ParseResponse: grokipediaParseResponse, }
Grokipedia is the EngineConfig for the Grokipedia typeahead API. Ported from reference/ddgs/engines/grokipedia.py.
JSON API: GET https://grokipedia.com/api/typeahead?query=<query>&limit=<limit> Returns: {"results": [{"title": "...", "snippet": "...", "slug": "..."}]}
var Mojeek = EngineConfig{ Name: "mojeek", MinDelay: 2 * time.Second, MaxRetries: 3, RetryableStatus: defaultRetryableStatus, BuildRequest: mojeekBuildRequest, ParseResponse: mojeekParseResponse, }
Mojeek is the EngineConfig for Mojeek web search. Ported from reference/ddgs/engines/mojeek.py.
var Wikipedia = EngineConfig{ Name: "wikipedia", MinDelay: 1 * time.Second, MaxRetries: 2, RetryableStatus: defaultRetryableStatus, BuildRequest: wikipediaBuildRequest, ParseResponse: wikipediaParseResponse, }
Wikipedia is the EngineConfig for Wikipedia OpenSearch API. Ported from reference/ddgs/engines/wikipedia.py.
Unlike other engines, Wikipedia returns JSON (OpenSearch format), not HTML. The response is a JSON array: ["query", ["titles..."], ["descriptions..."], ["urls..."]]
var Yahoo = EngineConfig{ Name: "yahoo", MinDelay: 2 * time.Second, MaxRetries: 3, RetryableStatus: defaultRetryableStatus, BuildRequest: yahooBuildRequest, ParseResponse: yahooParseResponse, PostProcess: yahooPostProcess, }
Yahoo is the EngineConfig for Yahoo web search. Ported from reference/ddgs/engines/yahoo.py.
var Yandex = EngineConfig{ Name: "yandex", MinDelay: 2 * time.Second, MaxRetries: 3, RetryableStatus: defaultRetryableStatus, BuildRequest: yandexBuildRequest, ParseResponse: yandexParseResponse, }
Yandex is the EngineConfig for Yandex web search. Ported from reference/ddgs/engines/yandex.py.
Functions ¶
func UnwrapRedirect ¶
func UnwrapRedirect(href string, pattern RedirectPattern) string
UnwrapRedirect extracts the real URL from a search engine redirect wrapper.
func XPathExtract ¶
func XPathExtract(doc *html.Node, itemsXPath string, fields map[string]string) ([]map[string]string, error)
XPathExtract finds containers via itemsXPath, then extracts fields from each container using the fields map (field name -> XPath expression). Mirrors ddgs's BaseSearchEngine.extract_results().
Types ¶
type ClientOpts ¶
type ClientOpts struct {
BrowserProfile string // key into profiles.MappedTLSClients; empty = default
}
ClientOpts configures the TLS-impersonating HTTP client.
type EngineConfig ¶
type EngineConfig struct {
Name string
BuildRequest func(query string, opts SearchOpts) (*http.Request, error)
ParseResponse func(resp *http.Response) ([]Result, error)
PostProcess func(results []Result) []Result
// ClientProfile overrides the TLS client profile for this engine.
// If set, Execute creates a dedicated client with this profile.
// This is needed when the engine's User-Agent requires a matching
// TLS fingerprint (e.g. Google's GSA UA needs Safari iOS profile).
ClientProfile string
MinDelay time.Duration
MaxRetries int
RetryableStatus func(statusCode int) bool
}
EngineConfig defines a search engine's scraping pipeline.
func EngineByName ¶
func EngineByName(name string) (EngineConfig, bool)
EngineByName looks up any engine by name, including engines not in AllEngines() (e.g. Google). Returns false if not found.
type HTTPClient ¶
HTTPClient is the interface the pipeline calls. Tests substitute a fake.
func NewClient ¶
func NewClient(opts ClientOpts) (HTTPClient, error)
NewClient creates an HTTPClient backed by bogdanfinn/tls-client. This is the only place in the codebase that imports tls-client directly; everything else uses the HTTPClient interface from result.go.
type MultiSearch ¶
type MultiSearch struct {
Client HTTPClient
Engines []EngineConfig
// EngineTimeout is the maximum time to wait for any single engine.
// If an engine exceeds this deadline (e.g. due to rate-limit retries),
// its context is canceled and results from faster engines are returned.
// Zero means 10 seconds.
EngineTimeout time.Duration
}
MultiSearch dispatches a query to multiple engines concurrently.
func (*MultiSearch) Search ¶
func (m *MultiSearch) Search(ctx context.Context, query string, opts SearchOpts) (*SearchResult, error)
Search runs all engines concurrently, deduplicates by URL, collects per-engine errors.
type RedirectPattern ¶
type RedirectPattern int
RedirectPattern identifies a URL redirect scheme.
const ( RedirectNone RedirectPattern = iota RedirectDDG // //duckduckgo.com/l/?uddg=... RedirectYahoo // .../RU=.../RK=... RedirectGoogle // /url?q=... )
type Result ¶
type Result struct {
Title string `json:"title"`
URL string `json:"url"`
Snippet string `json:"snippet"`
Engine string `json:"engine"`
}
Result is a single search result from any engine.
func Execute ¶
func Execute(ctx context.Context, client HTTPClient, engine EngineConfig, query string, opts SearchOpts) ([]Result, error)
Execute runs the full engine pipeline: BuildRequest -> HTTP -> ParseResponse -> PostProcess. Handles rate limiting and retries with exponential backoff.
type SearchOpts ¶
type SearchOpts struct {
MaxResults int
Page int // 1-based page number (default: 1)
Region string // e.g. "us-en"
SafeSearch string // "on", "moderate", "off"
TimeLimit string // "d" (day), "w" (week), "m" (month), "y" (year)
}
SearchOpts controls a search request.
type SearchResult ¶
SearchResult is what MultiSearch returns.