crawlbase

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 3, 2026 License: MIT Imports: 11 Imported by: 0

README

Crawlbase Go SDK

Official Go client for the Crawlbase API. One package, every Crawlbase product — Crawling API, Scraper, Leads, Screenshots — with idiomatic Go ergonomics, context.Context support, and zero external dependencies (only net/http + stdlib).

Go Reference

Install

go get github.com/crawlbase/crawlbase-go

Requires Go 1.21+.

Quickstart

package main

import (
    "fmt"
    "log"

    "github.com/crawlbase/crawlbase-go"
)

func main() {
    api, err := crawlbase.NewCrawlingAPI("YOUR_TOKEN")
    if err != nil {
        log.Fatal(err)
    }

    res, err := api.Get("https://github.com/anthropic", nil)
    if err != nil {
        log.Fatal(err)
    }

    if res.StatusCode == 200 {
        fmt.Println(res.Body)
    }
}

Get a free token — 1,000 free requests, no credit card.

Tokens

Crawlbase issues two tokens per account:

  • Normal token — for static HTML / JSON endpoints. Faster + cheaper.
  • JavaScript token — for SPAs and pages that need browser rendering. Required to use page_wait, ajax_wait, scroll, css_click_selector.

The client doesn't switch tokens per-call. If you alternate, hold two clients:

api, _ := crawlbase.NewCrawlingAPI(os.Getenv("CRAWLBASE_TOKEN"))
js, _  := crawlbase.NewCrawlingAPI(os.Getenv("CRAWLBASE_JS_TOKEN"))

One client, every product

The Go SDK is intentionally lean: one CrawlingAPI client covers every Crawlbase product through the unified Crawling API endpoint:

Use case Pass in options
Plain crawl (nothing — the default)
Built-in scraper "scraper": "amazon-product-details" (and friends)
Screenshot "screenshot": "true"
Email extraction "scraper": "email-extractor"
Async + webhook "async": "true" + "callback": "https://..."
Push to Enterprise Crawler "async": "true" + "callback" + "crawler": "YourCrawler"

This is the same surface the other Crawlbase SDKs converge on under the hood. The standalone /scraper, /leads, /screenshots endpoints are closed to new sign-ups since 2024 — the Go SDK ships the modern path only.

The full parameter reference for every option is at /docs/crawling-api.

Common patterns

JavaScript rendering
api, _ := crawlbase.NewCrawlingAPI("YOUR_JS_TOKEN")
res, _ := api.Get("https://spa.example.com", map[string]string{
    "page_wait": "2000",
    "ajax_wait": "true",
    "scroll":    "true",
})
Use a built-in scraper
api, _ := crawlbase.NewCrawlingAPI("YOUR_TOKEN")
res, _ := api.Get(
    "https://www.amazon.com/dp/B08N5WRWNW",
    map[string]string{"scraper": "amazon-product-details"},
)
fmt.Println(res.JSON["name"], res.JSON["price"])
Geo-routing
res, _ := api.Get(
    "https://www.amazon.com/dp/B08N5WRWNW",
    map[string]string{"country": "DE"},
)
Retry with backoff
func crawl(api *crawlbase.CrawlingAPI, url string, attempts int) (*crawlbase.Response, error) {
    for i := 0; i < attempts; i++ {
        res, err := api.Get(url, nil)
        if err != nil {
            return nil, err
        }
        if res.StatusCode == 200 && res.PCStatus == 200 {
            return res, nil
        }
        if res.StatusCode >= 400 && res.StatusCode < 500 {
            return nil, fmt.Errorf("client error %d: %s", res.StatusCode, url)
        }
        d := time.Duration(rand.Float64() * math.Pow(2, float64(i)) * float64(time.Second))
        time.Sleep(d)
    }
    return nil, fmt.Errorf("failed: %s", url)
}
Async + webhook
res, _ := api.Get("https://example.com/", map[string]string{
    "async":    "true",
    "callback": "https://your-app.com/webhook",
})
fmt.Println(res.RID)  // correlate the eventual webhook delivery
Context for cancellation

Every verb has a *WithContext variant:

ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
res, err := api.GetWithContext(ctx, "https://example.com/", nil)
Screenshots
api, _ := crawlbase.NewCrawlingAPI("YOUR_JS_TOKEN")
res, _ := api.Get("https://www.apple.com/", map[string]string{
    "screenshot": "true",
})
img, _ := crawlbase.ImageBytes(res)
_ = os.WriteFile("apple.png", img, 0o644)

Errors and retries

The Crawlbase platform returns two status codes on every response:

  • Response.StatusCode — the HTTP status of the SDK's request to Crawlbase.
  • Response.PCStatus — Crawlbase's verdict on the target (the site you asked it to crawl). Branch on this for retry decisions.

A target can return 200 with empty body, in which case StatusCode is 200 but PCStatus is 520. See the Crawling API errors table for the full list.

res, err := api.Get(url, nil)
if err != nil { return err }

switch res.PCStatus {
case 200:
    use(res.Body)
case 520, 525:
    // 520 = empty body, 525 = anti-bot couldn't be solved.
    // Switch to JS token and retry.
case 521, 522, 523:
    // Target unreachable / timed out. Backoff + retry.
default:
    log.Printf("crawl failed: url=%s pc_status=%d", url, res.PCStatus)
}

All retries against the platform are free — only successful responses (PCStatus == 200) count against your quota.

Performance

  • Reuse a single client per token. The constructor is cheap, but each instance has its own http.Client with its own connection pool. Build once, share across goroutines (the SDK is goroutine-safe).
  • Use the cheapest token that works. Don't default to the JavaScript token "just in case" — the normal token is faster and uses less concurrency. Promote on PCStatus == 520 / 525.
  • Prefer ajax_wait over page_wait. Fixed waits burn concurrency even on fast pages.
  • For batch jobs: async + webhook. Synchronous calls hold a concurrency slot until the upstream finishes; async releases the slot the moment the request is queued.

Documentation

Full API reference: crawlbase.com/docs/sdk-go

godoc: pkg.go.dev/github.com/crawlbase/crawlbase-go

License

MIT

Documentation

Overview

Package crawlbase is the official Go client for the Crawlbase API (https://crawlbase.com/docs/api-reference).

The package exposes one client — CrawlingAPI — that covers every Crawlbase product through the unified Crawling API endpoint:

  • Plain crawls (default usage)
  • Built-in scrapers via options["scraper"] = "amazon-product-details" etc.
  • Screenshots via options["screenshot"] = "true"
  • Email extraction via options["scraper"] = "email-extractor"
  • Async + webhook delivery via options["async"] / options["callback"]

Idiomatic Go ergonomics, no external dependencies (only net/http + stdlib), sensible defaults.

Quickstart

api := crawlbase.NewCrawlingAPI("YOUR_TOKEN")
res, err := api.Get("https://github.com/anthropic", nil)
if err != nil { log.Fatal(err) }
if res.StatusCode == 200 {
    fmt.Println(res.Body)
}

Tokens

Crawlbase issues two tokens per account — a "normal" (TCP) token for static HTML / JSON endpoints, and a "JavaScript" token for SPAs and pages that hide content behind client-side rendering. Each client is constructed with one token; if you alternate between them, hold two clients.

Options

Every Crawling API parameter (country, device, page_wait, scroll, scraper, async, callback, etc. — see https://crawlbase.com/docs/crawling-api) is passed as an entry in the options map. Pass nil for no options.

api.Get(url, map[string]string{
    "country":   "DE",
    "page_wait": "2000",
    "scroll":    "true",
})

Context

Every verb has a *WithContext variant for cancellation, deadlines, and trace propagation:

ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
defer cancel()
res, err := api.GetWithContext(ctx, url, nil)

Response

All verbs return a Response with the HTTP status, body, lower-cased headers, and the Crawlbase-specific verdict fields (PCStatus, OriginalStatus, URL, RID) lifted out of the headers for typed access.

Index

Examples

Constants

This section is empty.

Variables

View Source
var ErrTokenRequired = errors.New("crawlbase: token is required")

ErrTokenRequired is returned by the constructors when called with an empty token. Most other errors come straight from net/http and are returned to the caller as-is.

Functions

func ImageBytes

func ImageBytes(res *Response) ([]byte, error)

ImageBytes decodes the base64-encoded screenshot in res.Body into raw image bytes ready for os.WriteFile / image.Decode. Use this on responses from screenshot calls (CrawlingAPI.Get with options["screenshot"] = "true").

Returns an error if the body isn't valid base64 — verify res.StatusCode and res.PCStatus first.

Types

type CrawlingAPI

type CrawlingAPI struct {
	// contains filtered or unexported fields
}

CrawlingAPI is a client for the general-purpose Crawlbase Crawling API. It's the engine the rest of the platform sits on top of — JS rendering, anti-bot bypass, residential proxy routing, and the scraper library are all reachable from here through the options map.

See https://crawlbase.com/docs/crawling-api for the full parameter reference.

Example (JavascriptRendering)

Use the JavaScript token to render SPAs. Combine page_wait / ajax_wait / scroll / css_click_selector based on what the target needs. Order to think about: a fixed wait, then network-idle, then scroll for lazy-load, then click for any gating UI element.

api, _ := crawlbase.NewCrawlingAPI("YOUR_JS_TOKEN")
res, err := api.Get("https://spa.example.com", map[string]string{
	"page_wait": "2000",
	"ajax_wait": "true",
	"scroll":    "true",
})
if err != nil {
	log.Fatal(err)
}
fmt.Println(res.StatusCode)
Example (Scraper)

Apply a built-in scraper via the Crawling API to skip the parser step on supported sites. The Body comes back as a JSON string and is also pre-decoded into res.JSON for direct field access.

api, _ := crawlbase.NewCrawlingAPI(os.Getenv("CRAWLBASE_TOKEN"))
res, err := api.Get(
	"https://www.amazon.com/dp/B08N5WRWNW",
	map[string]string{"scraper": "amazon-product-details"},
)
if err != nil {
	log.Fatal(err)
}
if name, ok := res.JSON["name"].(string); ok {
	fmt.Println(name)
}
Example (Screenshot)

Capture a screenshot via the Crawling API. The Body is base64- encoded image bytes; use ImageBytes to decode.

api, _ := crawlbase.NewCrawlingAPI("YOUR_JS_TOKEN")
res, err := api.Get("https://www.apple.com/", map[string]string{
	"screenshot": "true",
})
if err != nil {
	log.Fatal(err)
}
img, err := crawlbase.ImageBytes(res)
if err != nil {
	log.Fatal(err)
}
_ = os.WriteFile("apple.png", img, 0o644)

func NewCrawlingAPI

func NewCrawlingAPI(token string) (*CrawlingAPI, error)

NewCrawlingAPI constructs a Crawling API client with the given token. Token can be either the "normal" (TCP) token or the JavaScript token, depending on whether you need browser rendering. The client doesn't switch tokens per-call, so hold two clients if you alternate.

The constructor returns ErrTokenRequired if token is empty.

Example

Three-line quickstart. Replace YOUR_TOKEN with the token from your Crawlbase dashboard — sign-up gives 1,000 free requests, no credit card.

api, err := crawlbase.NewCrawlingAPI("YOUR_TOKEN")
if err != nil {
	log.Fatal(err)
}
res, err := api.Get("https://github.com/anthropic", nil)
if err != nil {
	log.Fatal(err)
}
if res.StatusCode == 200 {
	fmt.Println(len(res.Body), "bytes received")
}

func (*CrawlingAPI) Get

func (a *CrawlingAPI) Get(targetURL string, options map[string]string) (*Response, error)

Get fetches targetURL through Crawlbase. Pass nil for options to send just the target; otherwise every Crawling API parameter is reachable here as a key in the options map (country, device, page_wait, scroll, scraper, async, callback, store, format, etc.).

func (*CrawlingAPI) GetWithContext

func (a *CrawlingAPI) GetWithContext(ctx context.Context, targetURL string, options map[string]string) (*Response, error)

GetWithContext is Get with cancellation / deadline / trace propagation. Use this from servers and any code path that should respect upstream timeouts.

Example

Use a context with a deadline for any code path that should respect upstream cancellation — HTTP handlers, RPC servers, anything else where a hung request would propagate.

api, _ := crawlbase.NewCrawlingAPI("YOUR_TOKEN")

ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

res, err := api.GetWithContext(ctx, "https://example.com/", nil)
if err != nil {
	log.Fatal(err)
}
fmt.Println(res.StatusCode)

func (*CrawlingAPI) Post

func (a *CrawlingAPI) Post(targetURL string, data any, options map[string]string) (*Response, error)

Post sends data to targetURL through Crawlbase as an HTTP POST. The data argument can be:

  • a url.Values for form-encoded bodies (default)
  • a string for raw bodies (JSON, plain text, etc.)
  • a []byte for raw bodies

To send JSON, pass options["post_content_type"] = "application/json" and provide the JSON-encoded body as a string or []byte.

func (*CrawlingAPI) PostWithContext

func (a *CrawlingAPI) PostWithContext(ctx context.Context, targetURL string, data any, options map[string]string) (*Response, error)

PostWithContext is Post with cancellation / deadline / trace propagation.

func (*CrawlingAPI) Put

func (a *CrawlingAPI) Put(targetURL string, data any, options map[string]string) (*Response, error)

Put is the PUT counterpart to Post — same body-encoding rules, same options bag.

func (*CrawlingAPI) PutWithContext

func (a *CrawlingAPI) PutWithContext(ctx context.Context, targetURL string, data any, options map[string]string) (*Response, error)

PutWithContext is Put with cancellation / deadline / trace propagation.

type Response

type Response struct {
	// StatusCode is the HTTP status of the request to Crawlbase.
	StatusCode int

	// Body is the page content returned by the target (or a JSON envelope
	// when the call set format=json or scraper=NAME).
	Body string

	// Headers are the response headers, lower-cased on the way in.
	Headers map[string]string

	// PCStatus is the Crawlbase verdict on the target — pulled from the
	// `pc_status` (or `cb_status`) response header. Branch on this for
	// retry decisions. Zero when not present.
	PCStatus int

	// OriginalStatus is the HTTP status the target returned to Crawlbase —
	// pulled from the `original_status` response header. Zero when not
	// present.
	OriginalStatus int

	// URL is the final URL after target-side redirects. Pulled from the
	// `url` response header.
	URL string

	// RID is the Crawlbase request identifier. Set when the call carried
	// async=true or store=true; empty otherwise.
	RID string

	// JSON is the response body pre-parsed into a generic map. Populated
	// only when the response Content-Type is JSON (e.g. scraper=... or
	// format=json calls). Use it to avoid double-parsing the body.
	JSON map[string]any
}

Response is what every Crawlbase API verb returns on success. Fields follow the same naming convention used by the other Crawlbase SDKs (Python / Node / Ruby / PHP) so cross-language porting is mechanical.

StatusCode is the HTTP status of the SDK's request to Crawlbase. PCStatus is Crawlbase's verdict on the *target* (the site you asked it to crawl). They can disagree — a target can return 200 with empty body, in which case StatusCode is 200 but PCStatus is 520. Always branch on PCStatus when deciding whether to retry. See https://crawlbase.com/docs/crawling-api/#errors for the full table.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL