Documentation
¶
Overview ¶
Package crawlbase is the official Go client for the Crawlbase API (https://crawlbase.com/docs/api-reference).
The package exposes one client — CrawlingAPI — that covers every Crawlbase product through the unified Crawling API endpoint:
- Plain crawls (default usage)
- Built-in scrapers via options["scraper"] = "amazon-product-details" etc.
- Screenshots via options["screenshot"] = "true"
- Email extraction via options["scraper"] = "email-extractor"
- Async + webhook delivery via options["async"] / options["callback"]
Idiomatic Go ergonomics, no external dependencies (only net/http + stdlib), sensible defaults.
Quickstart ¶
api := crawlbase.NewCrawlingAPI("YOUR_TOKEN")
res, err := api.Get("https://github.com/anthropic", nil)
if err != nil { log.Fatal(err) }
if res.StatusCode == 200 {
fmt.Println(res.Body)
}
Tokens ¶
Crawlbase issues two tokens per account — a "normal" (TCP) token for static HTML / JSON endpoints, and a "JavaScript" token for SPAs and pages that hide content behind client-side rendering. Each client is constructed with one token; if you alternate between them, hold two clients.
Options ¶
Every Crawling API parameter (country, device, page_wait, scroll, scraper, async, callback, etc. — see https://crawlbase.com/docs/crawling-api) is passed as an entry in the options map. Pass nil for no options.
api.Get(url, map[string]string{
"country": "DE",
"page_wait": "2000",
"scroll": "true",
})
Context ¶
Every verb has a *WithContext variant for cancellation, deadlines, and trace propagation:
ctx, cancel := context.WithTimeout(ctx, 30*time.Second) defer cancel() res, err := api.GetWithContext(ctx, url, nil)
Response ¶
All verbs return a Response with the HTTP status, body, lower-cased headers, and the Crawlbase-specific verdict fields (PCStatus, OriginalStatus, URL, RID) lifted out of the headers for typed access.
Index ¶
- Variables
- func ImageBytes(res *Response) ([]byte, error)
- type CrawlingAPI
- func (a *CrawlingAPI) Get(targetURL string, options map[string]string) (*Response, error)
- func (a *CrawlingAPI) GetWithContext(ctx context.Context, targetURL string, options map[string]string) (*Response, error)
- func (a *CrawlingAPI) Post(targetURL string, data any, options map[string]string) (*Response, error)
- func (a *CrawlingAPI) PostWithContext(ctx context.Context, targetURL string, data any, options map[string]string) (*Response, error)
- func (a *CrawlingAPI) Put(targetURL string, data any, options map[string]string) (*Response, error)
- func (a *CrawlingAPI) PutWithContext(ctx context.Context, targetURL string, data any, options map[string]string) (*Response, error)
- type Response
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var ErrTokenRequired = errors.New("crawlbase: token is required")
ErrTokenRequired is returned by the constructors when called with an empty token. Most other errors come straight from net/http and are returned to the caller as-is.
Functions ¶
func ImageBytes ¶
ImageBytes decodes the base64-encoded screenshot in res.Body into raw image bytes ready for os.WriteFile / image.Decode. Use this on responses from screenshot calls (CrawlingAPI.Get with options["screenshot"] = "true").
Returns an error if the body isn't valid base64 — verify res.StatusCode and res.PCStatus first.
Types ¶
type CrawlingAPI ¶
type CrawlingAPI struct {
// contains filtered or unexported fields
}
CrawlingAPI is a client for the general-purpose Crawlbase Crawling API. It's the engine the rest of the platform sits on top of — JS rendering, anti-bot bypass, residential proxy routing, and the scraper library are all reachable from here through the options map.
See https://crawlbase.com/docs/crawling-api for the full parameter reference.
Example (JavascriptRendering) ¶
Use the JavaScript token to render SPAs. Combine page_wait / ajax_wait / scroll / css_click_selector based on what the target needs. Order to think about: a fixed wait, then network-idle, then scroll for lazy-load, then click for any gating UI element.
api, _ := crawlbase.NewCrawlingAPI("YOUR_JS_TOKEN")
res, err := api.Get("https://spa.example.com", map[string]string{
"page_wait": "2000",
"ajax_wait": "true",
"scroll": "true",
})
if err != nil {
log.Fatal(err)
}
fmt.Println(res.StatusCode)
Example (Scraper) ¶
Apply a built-in scraper via the Crawling API to skip the parser step on supported sites. The Body comes back as a JSON string and is also pre-decoded into res.JSON for direct field access.
api, _ := crawlbase.NewCrawlingAPI(os.Getenv("CRAWLBASE_TOKEN"))
res, err := api.Get(
"https://www.amazon.com/dp/B08N5WRWNW",
map[string]string{"scraper": "amazon-product-details"},
)
if err != nil {
log.Fatal(err)
}
if name, ok := res.JSON["name"].(string); ok {
fmt.Println(name)
}
Example (Screenshot) ¶
Capture a screenshot via the Crawling API. The Body is base64- encoded image bytes; use ImageBytes to decode.
api, _ := crawlbase.NewCrawlingAPI("YOUR_JS_TOKEN")
res, err := api.Get("https://www.apple.com/", map[string]string{
"screenshot": "true",
})
if err != nil {
log.Fatal(err)
}
img, err := crawlbase.ImageBytes(res)
if err != nil {
log.Fatal(err)
}
_ = os.WriteFile("apple.png", img, 0o644)
func NewCrawlingAPI ¶
func NewCrawlingAPI(token string) (*CrawlingAPI, error)
NewCrawlingAPI constructs a Crawling API client with the given token. Token can be either the "normal" (TCP) token or the JavaScript token, depending on whether you need browser rendering. The client doesn't switch tokens per-call, so hold two clients if you alternate.
The constructor returns ErrTokenRequired if token is empty.
Example ¶
Three-line quickstart. Replace YOUR_TOKEN with the token from your Crawlbase dashboard — sign-up gives 1,000 free requests, no credit card.
api, err := crawlbase.NewCrawlingAPI("YOUR_TOKEN")
if err != nil {
log.Fatal(err)
}
res, err := api.Get("https://github.com/anthropic", nil)
if err != nil {
log.Fatal(err)
}
if res.StatusCode == 200 {
fmt.Println(len(res.Body), "bytes received")
}
func (*CrawlingAPI) Get ¶
Get fetches targetURL through Crawlbase. Pass nil for options to send just the target; otherwise every Crawling API parameter is reachable here as a key in the options map (country, device, page_wait, scroll, scraper, async, callback, store, format, etc.).
func (*CrawlingAPI) GetWithContext ¶
func (a *CrawlingAPI) GetWithContext(ctx context.Context, targetURL string, options map[string]string) (*Response, error)
GetWithContext is Get with cancellation / deadline / trace propagation. Use this from servers and any code path that should respect upstream timeouts.
Example ¶
Use a context with a deadline for any code path that should respect upstream cancellation — HTTP handlers, RPC servers, anything else where a hung request would propagate.
api, _ := crawlbase.NewCrawlingAPI("YOUR_TOKEN")
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
res, err := api.GetWithContext(ctx, "https://example.com/", nil)
if err != nil {
log.Fatal(err)
}
fmt.Println(res.StatusCode)
func (*CrawlingAPI) Post ¶
func (a *CrawlingAPI) Post(targetURL string, data any, options map[string]string) (*Response, error)
Post sends data to targetURL through Crawlbase as an HTTP POST. The data argument can be:
- a url.Values for form-encoded bodies (default)
- a string for raw bodies (JSON, plain text, etc.)
- a []byte for raw bodies
To send JSON, pass options["post_content_type"] = "application/json" and provide the JSON-encoded body as a string or []byte.
func (*CrawlingAPI) PostWithContext ¶
func (a *CrawlingAPI) PostWithContext(ctx context.Context, targetURL string, data any, options map[string]string) (*Response, error)
PostWithContext is Post with cancellation / deadline / trace propagation.
type Response ¶
type Response struct {
// StatusCode is the HTTP status of the request to Crawlbase.
StatusCode int
// Body is the page content returned by the target (or a JSON envelope
// when the call set format=json or scraper=NAME).
Body string
// Headers are the response headers, lower-cased on the way in.
Headers map[string]string
// PCStatus is the Crawlbase verdict on the target — pulled from the
// `pc_status` (or `cb_status`) response header. Branch on this for
// retry decisions. Zero when not present.
PCStatus int
// OriginalStatus is the HTTP status the target returned to Crawlbase —
// pulled from the `original_status` response header. Zero when not
// present.
OriginalStatus int
// URL is the final URL after target-side redirects. Pulled from the
// `url` response header.
URL string
// RID is the Crawlbase request identifier. Set when the call carried
// async=true or store=true; empty otherwise.
RID string
// JSON is the response body pre-parsed into a generic map. Populated
// only when the response Content-Type is JSON (e.g. scraper=... or
// format=json calls). Use it to avoid double-parsing the body.
JSON map[string]any
}
Response is what every Crawlbase API verb returns on success. Fields follow the same naming convention used by the other Crawlbase SDKs (Python / Node / Ruby / PHP) so cross-language porting is mechanical.
StatusCode is the HTTP status of the SDK's request to Crawlbase. PCStatus is Crawlbase's verdict on the *target* (the site you asked it to crawl). They can disagree — a target can return 200 with empty body, in which case StatusCode is 200 but PCStatus is 520. Always branch on PCStatus when deciding whether to retry. See https://crawlbase.com/docs/crawling-api/#errors for the full table.