crawlbase

package module

v0.1.0 Latest Latest Go to latest Published: May 3, 2026 License: MIT Imports: 11 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/crawlbase/crawlbase-go

Links

Open Source Insights

README ¶

Crawlbase Go SDK

Official Go client for the Crawlbase API. One package, every Crawlbase product — Crawling API, Scraper, Leads, Screenshots — with idiomatic Go ergonomics, context.Context support, and zero external dependencies (only net/http + stdlib).

Install

go get github.com/crawlbase/crawlbase-go

Requires Go 1.21+.

Quickstart

package main

import (
    "fmt"
    "log"

    "github.com/crawlbase/crawlbase-go"
)

func main() {
    api, err := crawlbase.NewCrawlingAPI("YOUR_TOKEN")
    if err != nil {
        log.Fatal(err)
    }

    res, err := api.Get("https://github.com/anthropic", nil)
    if err != nil {
        log.Fatal(err)
    }

    if res.StatusCode == 200 {
        fmt.Println(res.Body)
    }
}

Get a free token — 1,000 free requests, no credit card.

Tokens

Crawlbase issues two tokens per account:

Normal token — for static HTML / JSON endpoints. Faster + cheaper.
JavaScript token — for SPAs and pages that need browser rendering. Required to use page_wait, ajax_wait, scroll, css_click_selector.

The client doesn't switch tokens per-call. If you alternate, hold two clients:

api, _ := crawlbase.NewCrawlingAPI(os.Getenv("CRAWLBASE_TOKEN"))
js, _  := crawlbase.NewCrawlingAPI(os.Getenv("CRAWLBASE_JS_TOKEN"))

One client, every product

The Go SDK is intentionally lean: one CrawlingAPI client covers every Crawlbase product through the unified Crawling API endpoint:

Use case	Pass in `options`
Plain crawl	(nothing — the default)
Built-in scraper	`"scraper": "amazon-product-details"` (and friends)
Screenshot	`"screenshot": "true"`
Email extraction	`"scraper": "email-extractor"`
Async + webhook	`"async": "true"` + `"callback": "https://..."`
Push to Enterprise Crawler	`"async": "true"` + `"callback"` + `"crawler": "YourCrawler"`

This is the same surface the other Crawlbase SDKs converge on under the hood. The standalone /scraper, /leads, /screenshots endpoints are closed to new sign-ups since 2024 — the Go SDK ships the modern path only.

The full parameter reference for every option is at /docs/crawling-api.

Common patterns

JavaScript rendering

api, _ := crawlbase.NewCrawlingAPI("YOUR_JS_TOKEN")
res, _ := api.Get("https://spa.example.com", map[string]string{
    "page_wait": "2000",
    "ajax_wait": "true",
    "scroll":    "true",
})

Use a built-in scraper

api, _ := crawlbase.NewCrawlingAPI("YOUR_TOKEN")
res, _ := api.Get(
    "https://www.amazon.com/dp/B08N5WRWNW",
    map[string]string{"scraper": "amazon-product-details"},
)
fmt.Println(res.JSON["name"], res.JSON["price"])

Geo-routing

res, _ := api.Get(
    "https://www.amazon.com/dp/B08N5WRWNW",
    map[string]string{"country": "DE"},
)

Retry with backoff

func crawl(api *crawlbase.CrawlingAPI, url string, attempts int) (*crawlbase.Response, error) {
    for i := 0; i < attempts; i++ {
        res, err := api.Get(url, nil)
        if err != nil {
            return nil, err
        }
        if res.StatusCode == 200 && res.PCStatus == 200 {
            return res, nil
        }
        if res.StatusCode >= 400 && res.StatusCode < 500 {
            return nil, fmt.Errorf("client error %d: %s", res.StatusCode, url)
        }
        d := time.Duration(rand.Float64() * math.Pow(2, float64(i)) * float64(time.Second))
        time.Sleep(d)
    }
    return nil, fmt.Errorf("failed: %s", url)
}

Async + webhook

res, _ := api.Get("https://example.com/", map[string]string{
    "async":    "true",
    "callback": "https://your-app.com/webhook",
})
fmt.Println(res.RID)  // correlate the eventual webhook delivery

Context for cancellation

Every verb has a *WithContext variant:

ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
res, err := api.GetWithContext(ctx, "https://example.com/", nil)

Screenshots

api, _ := crawlbase.NewCrawlingAPI("YOUR_JS_TOKEN")
res, _ := api.Get("https://www.apple.com/", map[string]string{
    "screenshot": "true",
})
img, _ := crawlbase.ImageBytes(res)
_ = os.WriteFile("apple.png", img, 0o644)

Errors and retries

The Crawlbase platform returns two status codes on every response:

Response.StatusCode — the HTTP status of the SDK's request to Crawlbase.
Response.PCStatus — Crawlbase's verdict on the target (the site you asked it to crawl). Branch on this for retry decisions.

A target can return 200 with empty body, in which case StatusCode is 200 but PCStatus is 520. See the Crawling API errors table for the full list.

res, err := api.Get(url, nil)
if err != nil { return err }

switch res.PCStatus {
case 200:
    use(res.Body)
case 520, 525:
    // 520 = empty body, 525 = anti-bot couldn't be solved.
    // Switch to JS token and retry.
case 521, 522, 523:
    // Target unreachable / timed out. Backoff + retry.
default:
    log.Printf("crawl failed: url=%s pc_status=%d", url, res.PCStatus)
}

All retries against the platform are free — only successful responses (PCStatus == 200) count against your quota.

Performance

Reuse a single client per token. The constructor is cheap, but each instance has its own http.Client with its own connection pool. Build once, share across goroutines (the SDK is goroutine-safe).
Use the cheapest token that works. Don't default to the JavaScript token "just in case" — the normal token is faster and uses less concurrency. Promote on PCStatus == 520 / 525.
Prefer ajax_wait over page_wait. Fixed waits burn concurrency even on fast pages.
For batch jobs: async + webhook. Synchronous calls hold a concurrency slot until the upstream finishes; async releases the slot the moment the request is queued.

Documentation

Full API reference: crawlbase.com/docs/sdk-go

godoc: pkg.go.dev/github.com/crawlbase/crawlbase-go

License

MIT

Documentation ¶

Overview ¶

Package crawlbase is the official Go client for the Crawlbase API (https://crawlbase.com/docs/api-reference).

The package exposes one client — CrawlingAPI — that covers every Crawlbase product through the unified Crawling API endpoint:

Plain crawls (default usage)
Built-in scrapers via options["scraper"] = "amazon-product-details" etc.
Screenshots via options["screenshot"] = "true"
Email extraction via options["scraper"] = "email-extractor"
Async + webhook delivery via options["async"] / options["callback"]

Idiomatic Go ergonomics, no external dependencies (only net/http + stdlib), sensible defaults.

Quickstart ¶

api := crawlbase.NewCrawlingAPI("YOUR_TOKEN")
res, err := api.Get("https://github.com/anthropic", nil)
if err != nil { log.Fatal(err) }
if res.StatusCode == 200 {
    fmt.Println(res.Body)
}

Tokens ¶

Crawlbase issues two tokens per account — a "normal" (TCP) token for static HTML / JSON endpoints, and a "JavaScript" token for SPAs and pages that hide content behind client-side rendering. Each client is constructed with one token; if you alternate between them, hold two clients.

Options ¶

Every Crawling API parameter (country, device, page_wait, scroll, scraper, async, callback, etc. — see https://crawlbase.com/docs/crawling-api) is passed as an entry in the options map. Pass nil for no options.

api.Get(url, map[string]string{
    "country":   "DE",
    "page_wait": "2000",
    "scroll":    "true",
})

Context ¶

Every verb has a *WithContext variant for cancellation, deadlines, and trace propagation:

ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
defer cancel()
res, err := api.GetWithContext(ctx, url, nil)

Response ¶

All verbs return a Response with the HTTP status, body, lower-cased headers, and the Crawlbase-specific verdict fields (PCStatus, OriginalStatus, URL, RID) lifted out of the headers for typed access.

Index ¶

Variables
func ImageBytes(res *Response) ([]byte, error)
type CrawlingAPI
- func NewCrawlingAPI(token string) (*CrawlingAPI, error)
type Response

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrTokenRequired = errors.New("crawlbase: token is required")

ErrTokenRequired is returned by the constructors when called with an empty token. Most other errors come straight from net/http and are returned to the caller as-is.

Functions ¶

func ImageBytes ¶

func ImageBytes(res *Response) ([]byte, error)

ImageBytes decodes the base64-encoded screenshot in res.Body into raw image bytes ready for os.WriteFile / image.Decode. Use this on responses from screenshot calls (CrawlingAPI.Get with options["screenshot"] = "true").

Returns an error if the body isn't valid base64 — verify res.StatusCode and res.PCStatus first.

Types ¶

type CrawlingAPI ¶

type CrawlingAPI struct {
	// contains filtered or unexported fields
}

CrawlingAPI is a client for the general-purpose Crawlbase Crawling API. It's the engine the rest of the platform sits on top of — JS rendering, anti-bot bypass, residential proxy routing, and the scraper library are all reachable from here through the options map.

See https://crawlbase.com/docs/crawling-api for the full parameter reference.

Example (JavascriptRendering) ¶

Use the JavaScript token to render SPAs. Combine page_wait / ajax_wait / scroll / css_click_selector based on what the target needs. Order to think about: a fixed wait, then network-idle, then scroll for lazy-load, then click for any gating UI element.

api, _ := crawlbase.NewCrawlingAPI("YOUR_JS_TOKEN")
res, err := api.Get("https://spa.example.com", map[string]string{
	"page_wait": "2000",
	"ajax_wait": "true",
	"scroll":    "true",
})
if err != nil {
	log.Fatal(err)
}
fmt.Println(res.StatusCode)

Example (Scraper) ¶

Apply a built-in scraper via the Crawling API to skip the parser step on supported sites. The Body comes back as a JSON string and is also pre-decoded into res.JSON for direct field access.

api, _ := crawlbase.NewCrawlingAPI(os.Getenv("CRAWLBASE_TOKEN"))
res, err := api.Get(
	"https://www.amazon.com/dp/B08N5WRWNW",
	map[string]string{"scraper": "amazon-product-details"},
)
if err != nil {
	log.Fatal(err)
}
if name, ok := res.JSON["name"].(string); ok {
	fmt.Println(name)
}

Example (Screenshot) ¶

Capture a screenshot via the Crawling API. The Body is base64- encoded image bytes; use ImageBytes to decode.

api, _ := crawlbase.NewCrawlingAPI("YOUR_JS_TOKEN")
res, err := api.Get("https://www.apple.com/", map[string]string{
	"screenshot": "true",
})
if err != nil {
	log.Fatal(err)
}
img, err := crawlbase.ImageBytes(res)
if err != nil {
	log.Fatal(err)
}
_ = os.WriteFile("apple.png", img, 0o644)

func NewCrawlingAPI ¶

func NewCrawlingAPI(token string) (*CrawlingAPI, error)

NewCrawlingAPI constructs a Crawling API client with the given token. Token can be either the "normal" (TCP) token or the JavaScript token, depending on whether you need browser rendering. The client doesn't switch tokens per-call, so hold two clients if you alternate.

The constructor returns ErrTokenRequired if token is empty.

Example ¶

Three-line quickstart. Replace YOUR_TOKEN with the token from your Crawlbase dashboard — sign-up gives 1,000 free requests, no credit card.

api, err := crawlbase.NewCrawlingAPI("YOUR_TOKEN")
if err != nil {
	log.Fatal(err)
}
res, err := api.Get("https://github.com/anthropic", nil)
if err != nil {
	log.Fatal(err)
}
if res.StatusCode == 200 {
	fmt.Println(len(res.Body), "bytes received")
}

func (*CrawlingAPI) Get ¶

func (a *CrawlingAPI) Get(targetURL string, options map[string]string) (*Response, error)

Get fetches targetURL through Crawlbase. Pass nil for options to send just the target; otherwise every Crawling API parameter is reachable here as a key in the options map (country, device, page_wait, scroll, scraper, async, callback, store, format, etc.).

func (*CrawlingAPI) GetWithContext ¶

func (a *CrawlingAPI) GetWithContext(ctx context.Context, targetURL string, options map[string]string) (*Response, error)

GetWithContext is Get with cancellation / deadline / trace propagation. Use this from servers and any code path that should respect upstream timeouts.

Example ¶

Use a context with a deadline for any code path that should respect upstream cancellation — HTTP handlers, RPC servers, anything else where a hung request would propagate.

api, _ := crawlbase.NewCrawlingAPI("YOUR_TOKEN")

ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

res, err := api.GetWithContext(ctx, "https://example.com/", nil)
if err != nil {
	log.Fatal(err)
}
fmt.Println(res.StatusCode)

func (*CrawlingAPI) Post ¶

func (a *CrawlingAPI) Post(targetURL string, data any, options map[string]string) (*Response, error)

Post sends data to targetURL through Crawlbase as an HTTP POST. The data argument can be:

a url.Values for form-encoded bodies (default)
a string for raw bodies (JSON, plain text, etc.)
a []byte for raw bodies

To send JSON, pass options["post_content_type"] = "application/json" and provide the JSON-encoded body as a string or []byte.

func (*CrawlingAPI) PostWithContext ¶

func (a *CrawlingAPI) PostWithContext(ctx context.Context, targetURL string, data any, options map[string]string) (*Response, error)

PostWithContext is Post with cancellation / deadline / trace propagation.

func (*CrawlingAPI) Put ¶

func (a *CrawlingAPI) Put(targetURL string, data any, options map[string]string) (*Response, error)

Put is the PUT counterpart to Post — same body-encoding rules, same options bag.

func (*CrawlingAPI) PutWithContext ¶

func (a *CrawlingAPI) PutWithContext(ctx context.Context, targetURL string, data any, options map[string]string) (*Response, error)

PutWithContext is Put with cancellation / deadline / trace propagation.

type Response ¶

type Response struct {
	// StatusCode is the HTTP status of the request to Crawlbase.
	StatusCode int

	// Body is the page content returned by the target (or a JSON envelope
	// when the call set format=json or scraper=NAME).
	Body string

	// Headers are the response headers, lower-cased on the way in.
	Headers map[string]string

	// PCStatus is the Crawlbase verdict on the target — pulled from the
	// `pc_status` (or `cb_status`) response header. Branch on this for
	// retry decisions. Zero when not present.
	PCStatus int

	// OriginalStatus is the HTTP status the target returned to Crawlbase —
	// pulled from the `original_status` response header. Zero when not
	// present.
	OriginalStatus int

	// URL is the final URL after target-side redirects. Pulled from the
	// `url` response header.
	URL string

	// RID is the Crawlbase request identifier. Set when the call carried
	// async=true or store=true; empty otherwise.
	RID string

	// JSON is the response body pre-parsed into a generic map. Populated
	// only when the response Content-Type is JSON (e.g. scraper=... or
	// format=json calls). Use it to avoid double-parsing the body.
	JSON map[string]any
}

Response is what every Crawlbase API verb returns on success. Fields follow the same naming convention used by the other Crawlbase SDKs (Python / Node / Ruby / PHP) so cross-language porting is mechanical.

StatusCode is the HTTP status of the SDK's request to Crawlbase. PCStatus is Crawlbase's verdict on the *target* (the site you asked it to crawl). They can disagree — a target can return 200 with empty body, in which case StatusCode is 200 but PCStatus is 520. Always branch on PCStatus when deciding whether to retry. See https://crawlbase.com/docs/crawling-api/#errors for the full table.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL