scraperapi

package module

v1.0.1 Latest Latest Go to latest Published: Oct 29, 2024 License: MIT Imports: 15 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zenrows/zenrows-go-sdk

Links

Open Source Insights

README ¶

ZenRows Scraper API Go SDK

This is the Go SDK for interacting with the ZenRows Scraper API, designed to help developers integrate web scraping capabilities into their Go applications. It simplifies the process of making HTTP requests, handling responses, and managing configurations for interacting with the ZenRows Scraper API.

Introduction

The ZenRows® Scraper API is a versatile tool designed to simplify and enhance the process of extracting data from websites. Whether you’re dealing with static or dynamic content, our API provides a range of features to meet your scraping needs efficiently.

With Premium Proxies, ZenRows gives you access to over 55 million residential IPs from 190+ countries, ensuring 99.9% uptime and highly reliable scraping sessions. Our system also handles advanced fingerprinting, header rotation, and IP management, enabling you to scrape even the most protected sites without needing to manually configure these elements.

ZenRows makes it easy to bypass complex anti-bot measures, handle JavaScript-heavy sites, and interact with web elements dynamically — all with the right features enabled.

Installation

To install the SDK, run:

go get github.com/zenrows/zenrows-go-sdk/services/api

Getting Started

To use the SDK, you need a ZenRows API key. You can find your API key in the ZenRows dashboard.

Usage

Client Initialization

Initialize the ZenRows client with your API key by either using the WithAPIKey client option or setting the ZENROWS_API_KEY environment variable:

import (
    "context"
    scraperapi "github.com/zenrows/zenrows-go-sdk/service/api"
)

client := scraperapi.NewClient(
    scraperapi.WithAPIKey("YOUR_API_KEY"),
)

Sending Requests

GET Requests

response, err := client.Get(context.Background(), "https://httpbin.io/anything", nil)
if err != nil {
    // handle error
}

if err = response.Error(); err != nil {
    // handle error
}

fmt.Println("Response Body:", string(response.Body()))

POST or PUT Requests

body := map[string]string{"key": "value"}
response, err := client.Post(context.Background(), "https://httpbin.io/anything", nil, body)
if err != nil {
    // handle error
}

if err = response.Error(); err != nil {
    // handle error
}

fmt.Println("Response Body:", string(response.Body()))

Custom Request Parameters

You can customize your requests using RequestParameters to modify the behavior of the scraping engine:

params := &scraperapi.RequestParameters{
    JSRender:          true,
    UsePremiumProxies: true,
    ProxyCountry:      "US",
}

response, err := client.Get(context.Background(), "https://httpbin.io/anything", params)
if err != nil {
    // handle error
}

if err = response.Error(); err != nil {
    // handle error
}

fmt.Println("Response Body:", response.String())

Handling Responses

The Response object provides several methods to access details about the HTTP response:

Body() []byte: Returns the raw response body.
String() string: Returns the response body as a string.
Status() string: Returns the status text (e.g., "200 OK").
StatusCode() int: Returns the HTTP status code (e.g., 200).
Header() http.Header: Returns the response headers.
Time() time.Duration: Returns the duration of the request.
ReceivedAt() time.Time: Returns the time when the response was received.
Size() int64: Returns the size of the response in bytes.
IsSuccess() bool: Returns true if the response status is in the 2xx range.
IsError() bool: Returns true if the response status is 4xx or higher.
Problem() *problem.Problem: Returns a parsed problem description if the response contains an error.
Error() error: Same as Problem(), but returns an error type.

In order to access additional details about the scraping process, you can use the following methods:

TargetHeaders() http.Header: Returns headers from the target page.
TargetCookies() []*http.Cookie: Returns cookies set by the target page.

Example

response, err := client.Get(context.Background(), "https://httpbin.io/anything", nil)
if err != nil {
    // handle error
} else {
    if prob := response.Problem(); prob != nil {
        fmt.Println("API Error:", prob.Detail)
        return
    }
    
    fmt.Println("Response Body:", response.String())
    fmt.Println("Response Target Headers:", response.TargetHeaders())
    fmt.Println("Response Target Cookies:", response.TargetCookies())
}

Configuration Options

You can customize the client using different options:

WithAPIKey(apiKey string): Sets the API key for authentication. If not provided, the SDK will look for the ZENROWS_API_KEY environment variable.
WithMaxRetryCount(maxRetryCount int): Sets the maximum number of retries for failed requests. Default is 0 (no retries).
WithRetryWaitTime(retryWaitTime time.Duration): Sets the time to wait before retrying a request. Default is 5 second.
WithRetryMaxWaitTime(retryMaxWaitTime time.Duration): Sets the maximum time to wait for retries. Default is 30 seconds.
WithMaxConcurrentRequests(maxConcurrentRequests int): Limits the number of concurrent requests. Default is 5. Make sure this value does not exceed your plan's concurrency limit, as it may result in 429 Too Many Requests errors.

Error Handling

The SDK provides custom error types for better error handling:

NotConfiguredError: Thrown when the client is not properly configured (e.g., missing API key).
InvalidHTTPMethodError: Thrown when an unsupported HTTP method is used (e.g., when sending PATCH or DELETE requests).
InvalidTargetURLError: Thrown when an invalid target URL is provided (e.g., target URL is empty, or malformed).
InvalidParameterError: Thrown when invalid parameters are used in the request. See the error message for details.

Examples

Concurrency

Concurrency in web scraping is essential for efficient data extraction, especially when dealing with multiple URLs. Managing the number of concurrent requests helps prevent overwhelming the target server and ensures you stay within rate limits. Depending on your subscription plan, you can perform twenty or more concurrent requests.

To limit the concurrency, the SDK uses a semaphore to control the number of concurrent requests that a single client can make. This value is set by the WithMaxConcurrentRequests option when initializing the client and defaults to 5.

See the example below for a demonstration of how to use the SDK with concurrency:

package main

import (
	"context"
	"fmt"
	"sync"

	scraperapi "github.com/zenrows/zenrows-go-sdk/service/api"
)

const (
	maxConcurrentRequests = 5  // run 5 scraping requests at the same time
	totalRequests         = 10 // send a total of 10 scraping requests
)

func main() {
	client := scraperapi.NewClient(
		scraperapi.WithAPIKey("YOUR_API_KEY"),
		scraperapi.WithMaxConcurrentRequests(maxConcurrentRequests),
	)

	var wg sync.WaitGroup
	for i := 0; i < totalRequests; i++ {
		wg.Add(1)
		go func(i int) {
			defer wg.Done()

			res, err := client.Get(context.Background(), "https://httpbin.io/anything", &scraperapi.RequestParameters{})
			if err != nil {
				fmt.Println(i, err)
				return
			}

			if err = res.Error(); err != nil {
				fmt.Println(i, err)
				return
			}

			fmt.Printf("[#%d]: %s\n", i, res.Status())
		}(i)
	}

	wg.Wait()
	fmt.Println("done")
}

This program will output the status of each request, running up to five concurrent requests at a time:

[#1]: 200 OK
[#0]: 200 OK
[#9]: 200 OK
[#5]: 200 OK
[#2]: 200 OK
[#8]: 200 OK
[#7]: 200 OK
[#6]: 200 OK
[#4]: 200 OK
[#3]: 200 OK
done

Retrying

The SDK supports automatic retries for failed requests. You can configure the maximum number of retries and the wait time between retries using the WithMaxRetryCount, WithRetryWaitTime, and WithRetryMaxWaitTime options.

A backoff strategy is used to increase the wait time between retries, starting at the RetryWaitTime and doubling the wait time for each subsequent retry until it reaches the RetryMaxWaitTime.

See the example below for a demonstration of how to use the SDK with retries:

package main

import (
	"context"
	"fmt"
	"sync"
	"time"

	scraperapi "github.com/zenrows/zenrows-go-sdk/service/api"
)

const (
	maxConcurrentRequests = 5  // run 5 scraping requests at the same time
	totalRequests         = 10 // send a total of 10 scraping requests
)

func main() {
	client := scraperapi.NewClient(
		scraperapi.WithAPIKey("YOUR_API_KEY"),
		scraperapi.WithMaxConcurrentRequests(maxConcurrentRequests),
		scraperapi.WithMaxRetryCount(5),                 // retry up to five times
		scraperapi.WithRetryWaitTime(20*time.Second),    // waiting at least 20s between retries (just for demonstration purposes!)
		scraperapi.WithRetryMaxWaitTime(25*time.Second), // and waiting a maximum of 20s between retries
	)

	var wg sync.WaitGroup
	for i := 0; i < totalRequests; i++ {
		wg.Add(1)
		go func(i int) {
			defer wg.Done()
			now := time.Now() // store the time, to be able to print the elapsed duration

			// target the https://httpbin.io/unstable endpoint, as it fails half of the times, so the retry mechanism takes care of
			// making sure we eventually receive a successful request
			res, err := client.Get(context.Background(), "https://httpbin.io/unstable", &scraperapi.RequestParameters{})
			if err != nil {
				fmt.Println(i, err)
				return
			}

			if err = res.Error(); err != nil {
				fmt.Println(i, err)
				return
			}

			fmt.Printf("[#%d]: %s (in %s)\n", i, res.Status(), time.Since(now))
		}(i)
	}

	wg.Wait()
	fmt.Println("done")
}

This program will output the status of each request, and the elapsed time. As we've set the retry mechanism to retry up to five times, with a minimum wait time of 20 seconds and a maximum of 25 seconds, the output will look like this:

[#6]: 200 OK (in 743.064708ms)
[#2]: 200 OK (in 1.202448208s)
[#1]: 200 OK (in 1.380041292s)
[#5]: 200 OK (in 1.626613583s)
[#8]: 200 OK (in 2.635505541s)
[#4]: 200 OK (in 3.217849791s)
[#9]: 200 OK (in 21.973982334s) <-- this request took longer because it had to retry 1 time
[#3]: 200 OK (in 22.031445708s) <-- this request took longer because it had to retry 1 time
[#7]: 200 OK (in 22.130371583s) <-- this request took longer because it had to retry 1 time
[#0]: 200 OK (in 45.030251042s) <-- this request took longer because it had to retry 2 times
done

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Documentation ¶

Index ¶

Variables
type Client
- func NewClient(opts ...Option) *Client
- func (c *Client) Get(ctx context.Context, targetURL string, params *RequestParameters) (*Response, error)
- func (c *Client) Post(ctx context.Context, targetURL string, params *RequestParameters, body any) (*Response, error)
- func (c *Client) Put(ctx context.Context, targetURL string, params *RequestParameters, body any) (*Response, error)
- func (c *Client) Scrape(ctx context.Context, method, targetURL string, params *RequestParameters, ...) (*Response, error)
type IClient
type InvalidHTTPMethodError
- func (InvalidHTTPMethodError) Error() string
type InvalidParameterError
- func (e InvalidParameterError) Error() string
type InvalidTargetURLError
- func (e InvalidTargetURLError) Error() string
- func (e InvalidTargetURLError) Unwrap() error
type NotConfiguredError
- func (NotConfiguredError) Error() string
type Option
- func WithAPIKey(apiKey string) Option
- func WithBaseURL(baseURL string) Option
- func WithMaxConcurrentRequests(maxConcurrentRequests int) Option
- func WithMaxRetryCount(maxRetryCount int) Option
- func WithRetryMaxWaitTime(retryMaxWaitTime time.Duration) Option
- func WithRetryWaitTime(retryWaitTime time.Duration) Option
type OutputType
type RequestParameters
- func ParseQueryRequestParameters(query url.Values) (*RequestParameters, error)
- func (p *RequestParameters) ToURLValues() url.Values
- func (p *RequestParameters) Validate() error
type ResourceType
type Response
- func (r *Response) Body() []byte
- func (r *Response) Error() error
- func (r *Response) Header() http.Header
- func (r *Response) IsError() bool
- func (r *Response) IsSuccess() bool
- func (r *Response) Problem() *problem.Problem
- func (r *Response) ReceivedAt() time.Time
- func (r *Response) Size() int64
- func (r *Response) Status() string
- func (r *Response) StatusCode() int
- func (r *Response) String() string
- func (r *Response) TargetCookies() []*http.Cookie
- func (r *Response) TargetHeaders() http.Header
- func (r *Response) Time() time.Duration
type ResponseType
type ScreenshotFormat

Constants ¶

This section is empty.

Variables ¶

View Source

var AllOutputTypes = map[OutputType]struct{}{
	OutputTypeEmails:       {},
	OutputTypePhoneNumbers: {},
	OutputTypeHeadings:     {},
	OutputTypeImages:       {},
	OutputTypeAudios:       {},
	OutputTypeVideos:       {},
	OutputTypeLinks:        {},
	OutputTypeTables:       {},
	OutputTypeMenus:        {},
	OutputTypeHashtags:     {},
	OutputTypeMetadata:     {},
	OutputTypeFavicon:      {},
	OutputTypeAll:          {},
}

View Source

var AllResourceTypes = map[ResourceType]struct{}{
	ResourceTypeEventSource: {},
	ResourceTypeFetch:       {},
	ResourceTypeFont:        {},
	ResourceTypeImage:       {},
	ResourceTypeManifest:    {},
	ResourceTypeMedia:       {},
	ResourceTypeOther:       {},
	ResourceTypeScript:      {},
	ResourceTypeStylesheet:  {},
	ResourceTypeTextTrack:   {},
	ResourceTypeWebSocket:   {},
	ResourceTypeXHR:         {},
}

View Source

var AllResponseTypes = map[ResponseType]struct{}{
	ResponseTypeMarkdown:  {},
	ResponseTypePlainText: {},
	ResponseTypePDF:       {},
}

View Source

var AllScreenshotFormats = map[ScreenshotFormat]struct{}{
	ScreenshotFormatPNG:  {},
	ScreenshotFormatJPEG: {},
}

View Source

var Version = version.Version

View Source

var VersionPrerelease = version.Prerelease

Functions ¶

This section is empty.

Types ¶

type Client ¶

type Client struct {
	// contains filtered or unexported fields
}

Client is the ZenRows Scraper API client

func NewClient ¶

func NewClient(opts ...Option) *Client

NewClient creates and returns a new ZenRows Scraper API client

func (*Client) Get ¶

func (c *Client) Get(ctx context.Context, targetURL string, params *RequestParameters) (*Response, error)

Get sends an HTTP GET request to the ZenRows Scraper API to scrape the given target URL using the specified parameters.

func (*Client) Post ¶

func (c *Client) Post(ctx context.Context, targetURL string, params *RequestParameters, body any) (*Response, error)

Post sends an HTTP POST request to the ZenRows Scraper API to scrape the given target URL using the specified parameters.

func (*Client) Put ¶

func (c *Client) Put(ctx context.Context, targetURL string, params *RequestParameters, body any) (*Response, error)

Put sends an HTTP PUT request to the ZenRows Scraper API to scrape the given target URL using the specified parameters.

func (*Client) Scrape ¶

func (c *Client) Scrape(ctx context.Context, method, targetURL string, params *RequestParameters, body any) (*Response, error)

Scrape sends a request to the ZenRows Scraper API to scrape the given target URL using the specified method and parameters.

type IClient ¶

type IClient interface {
	// Scrape sends a request to the ZenRows Scraper API to scrape the given target URL using the specified method and parameters.
	Scrape(ctx context.Context, targetURL, method string, params RequestParameters) (*Response, error)
	// Get sends a GET request to the ZenRows Scraper API to scrape the given target URL using the specified parameters.
	Get(ctx context.Context, targetURL string, params RequestParameters) (*Response, error)
	// Post sends a POST request to the ZenRows Scraper API to scrape the given target URL using the specified parameters.
	Post(ctx context.Context, targetURL string, params RequestParameters) (*Response, error)
	// Put sends a PUT request to the ZenRows Scraper API to scrape the given target URL using the specified parameters.
	Put(ctx context.Context, targetURL string, params RequestParameters) (*Response, error)
}

type InvalidHTTPMethodError ¶

type InvalidHTTPMethodError struct{}

InvalidHTTPMethodError results when the ZenRows Scraper API client is used with an invalid HTTP method.

func (InvalidHTTPMethodError) Error ¶

func (InvalidHTTPMethodError) Error() string

type InvalidParameterError ¶

type InvalidParameterError struct {
	Msg string
}

InvalidParameterError results when the ZenRows Scraper API client is used with an invalid parameter.

func (InvalidParameterError) Error ¶

func (e InvalidParameterError) Error() string

type InvalidTargetURLError ¶

type InvalidTargetURLError struct {
	URL string
	Msg string
	Err error
}

InvalidTargetURLError results when the ZenRows Scraper API client is used with an invalid target URL.

func (InvalidTargetURLError) Error ¶

func (e InvalidTargetURLError) Error() string

func (InvalidTargetURLError) Unwrap ¶

func (e InvalidTargetURLError) Unwrap() error

type NotConfiguredError ¶

type NotConfiguredError struct{}

NotConfiguredError results when the ZenRows Scraper API client is used without a valid API Key.

func (NotConfiguredError) Error ¶

func (NotConfiguredError) Error() string

type Option ¶

type Option interface {
	// contains filtered or unexported methods
}

Option configures the ZenRows Scraper API client.

func WithAPIKey ¶

func WithAPIKey(apiKey string) Option

WithAPIKey returns an Option which configures the API key of the ZenRows Scraper API client.

func WithBaseURL ¶

func WithBaseURL(baseURL string) Option

WithBaseURL returns an Option which configures the base URL of the ZenRows Scraper API client.

func WithMaxConcurrentRequests ¶

func WithMaxConcurrentRequests(maxConcurrentRequests int) Option

WithMaxConcurrentRequests returns an Option which configures the maximum number of concurrent requests to the ZenRows Scraper API. See https://docs.zenrows.com/scraper-api/features/concurrency for more information.

IMPORTANT: Breaking the concurrency limit will result in a 429 Too Many Requests error. If you exceed the limit repeatedly, your account may be temporarily suspended, so make sure to set this value to a reasonable number according to your subscription plan.

func WithMaxRetryCount ¶

func WithMaxRetryCount(maxRetryCount int) Option

WithMaxRetryCount returns an Option which configures the maximum number of retries to perform.

func WithRetryMaxWaitTime ¶

func WithRetryMaxWaitTime(retryMaxWaitTime time.Duration) Option

WithRetryMaxWaitTime returns an Option which configures the maximum time to wait before retrying the request.

func WithRetryWaitTime ¶

func WithRetryWaitTime(retryWaitTime time.Duration) Option

WithRetryWaitTime returns an Option which configures the time to wait before retrying the request.

type OutputType ¶

type OutputType string

const (
	OutputTypeEmails       OutputType = "emails"
	OutputTypePhoneNumbers OutputType = "phone_numbers"
	OutputTypeHeadings     OutputType = "headings"
	OutputTypeImages       OutputType = "images"
	OutputTypeAudios       OutputType = "audios"
	OutputTypeVideos       OutputType = "videos"
	OutputTypeLinks        OutputType = "links"
	OutputTypeTables       OutputType = "tables"
	OutputTypeMenus        OutputType = "menus"
	OutputTypeHashtags     OutputType = "hashtags"
	OutputTypeMetadata     OutputType = "metadata"
	OutputTypeFavicon      OutputType = "favicon"
	OutputTypeAll          OutputType = "*"
)

type RequestParameters ¶

type RequestParameters struct {
	// Proxy settings
	UsePremiumProxies bool   `json:"premium_proxy,omitempty" structs:"premium_proxy,omitempty" schema:"premium_proxy"`
	ProxyCountry      string `json:"proxy_country,omitempty" structs:"proxy_country,omitempty" schema:"proxy_country"`

	// Output modifiers
	AutoParse    bool         `json:"autoparse,omitempty" structs:"autoparse,omitempty" schema:"autoparse"`
	CSSExtractor string       `json:"css_extractor,omitempty" structs:"css_extractor,omitempty" schema:"css_extractor"`
	JSONResponse bool         `json:"json_response,omitempty" structs:"json_response,omitempty" schema:"json_response"`
	ResponseType ResponseType `json:"response_type,omitempty" structs:"response_type,omitempty" schema:"response_type"`
	Outputs      []OutputType `json:"outputs,omitempty" structs:"outputs,omitempty" schema:"outputs"`

	// JSRender enables JavaScript rendering for the request. If not enabled, the request will be processed by the standard scraping engine,
	// which is faster but does not execute JavaScript and may not bypass some anti-bot systems.
	//
	// See https://docs.zenrows.com/scraper-api/features/js-rendering for more information.
	JSRender bool `json:"js_render,omitempty" structs:"js_render,omitempty" schema:"js_render"`

	// JSInstructions is a serialized JSON object that contains custom JavaScript instructions that will be executed in the page before
	// returning the response (only available when using JSRender).
	//
	// See https://docs.zenrows.com/scraper-api/features/js-rendering#using-the-javascript-instructions for more information.
	JSInstructions string `json:"js_instructions,omitempty" structs:"js_instructions,omitempty" schema:"js_instructions"`

	// WaitMilliseconds will wait for the specified number of milliseconds before returning the response (only available when
	// using JSRender). The maximum wait time is 30 seconds (30000 ms).
	WaitMilliseconds int `json:"wait,omitempty" structs:"wait,omitempty" schema:"wait"`

	// WaitForSelector will wait for the specified element to appear in the page before returning the response (only available when
	// using JSRender).
	//
	// See https://docs.zenrows.com/scraper-api/features/js-rendering#wait-for-selector for more information.
	//
	// IMPORTANT: Make sure that the element you are waiting for is present in the page. If the element does not appear, the request will
	// fail by a timeout error after a few seconds.
	WaitForSelector string `json:"wait_for,omitempty" structs:"wait_for,omitempty" schema:"wait_for"`

	// Screenshot will return a screenshot of the page (only available when using JSRender)
	Screenshot bool `json:"screenshot,omitempty" structs:"screenshot,omitempty" schema:"screenshot"`

	// ScreenshotFullPage will take a screenshot of the full page (only available when using JSRender and Screenshot is set to true)
	ScreenshotFullPage bool `json:"screenshot_fullpage,omitempty" structs:"screenshot_fullpage,omitempty" schema:"screenshot_fullpage"`

	// ScreenshotSelector will take a screenshot of the specified element (only available when using JSRender and Screenshot is set to true)
	ScreenshotSelector string `json:"screenshot_selector,omitempty" structs:"screenshot_selector,omitempty" schema:"screenshot_selector"`

	// ScreenshotFormat will set the format of the screenshot (only available when using JSRender and Screenshot is set to true).
	// The available formats are ScreenshotFormatPNG and ScreenshotFormatJPEG. The default format is ScreenshotFormatPNG.
	ScreenshotFormat ScreenshotFormat `json:"screenshot_format,omitempty" structs:"screenshot_format,omitempty" schema:"screenshot_format"`

	// ScreenshotQuality will set the quality of the screenshot (only available when using JSRender and Screenshot is set to true, and
	// the format is ScreenshotFormatJPEG). The quality must be between 1 and 100. The default quality is 100.
	ScreenshotQuality int `json:"screenshot_quality,omitempty" structs:"screenshot_quality,omitempty" schema:"screenshot_quality"`

	// ReturnOriginalStatus will return the original status code of the response wthen the request is not successful. When a request is not
	// successful, the ZenRows Scraper API will always return a 422 status code. If you enable this feature, the original status code will
	// be returned instead.
	ReturnOriginalStatus bool `json:"original_status,omitempty" structs:"original_status,omitempty" schema:"original_status"`

	// SessionID is an integer between 0 and 99999 that can be used to group requests together. If you provide a SessionID, all requests
	// with the same SessionID will use the same IP address for up to 10 minutes. This feature is useful for web scraping sites that track
	// sessions or limit IP rotation. It helps simulate a persistent session and avoids triggering anti-bot systems that flag
	// frequent IP changes.
	//
	// See https://docs.zenrows.com/scraper-api/features/other#session-id for more information.
	//
	// IMPORTANT: Use this feature only if you know what you are doing. If you provide a SessionID, the IP rotation feature will be disabled
	// for all requests with the same SessionID. This may affect the scraping quality and increase the chances of being blocked.
	SessionID int `json:"session_id,omitempty" structs:"session_id,omitempty" schema:"session_id"`

	// AllowedStatusCodes will return the response body of a request even if the status code is not a successful one (2xx), but
	// is one of the specified status codes in this list.
	//
	// See https://docs.zenrows.com/scraper-api/features/other#return-content-on-error for more information.
	//
	// IMPORTANT: ZenRows Scraper API only charges for successful requests. If you use this feature, you will also be charged for
	// unsuccessful requests matching the specified status codes.
	AllowedStatusCodes []int `json:"allowed_status_codes,omitempty" structs:"allowed_status_codes,omitempty" schema:"allowed_status_codes"`

	// BlockResources will block the specified resources from loading (only available when using JSRender)
	//
	// See https://docs.zenrows.com/scraper-api/features/js-rendering#block-resources for more information.
	//
	// IMPORTANT: ZenRows Scraper API already blocks some resources by default to improve the scraping quality. Use this feature only if you
	// know what you are doing.
	BlockResources []ResourceType `json:"block_resources,omitempty" structs:"block_resources,omitempty" schema:"block_resources"`

	// CustomHeaders is a http.Header object that will be used to set custom headers in the request.
	//
	// See https://docs.zenrows.com/scraper-api/features/headers for more information.
	//
	// IMPORTANT: ZenRows Scraper API already rotates and selects the best combination of headers (like User-Agent, Accept-Language, etc.)
	// automatically for each request. If you provide custom headers, the scraping quality may be affected. Use this feature only if you
	// know what you are doing.
	CustomHeaders http.Header `json:"custom_headers,omitempty" structs:"-" schema:"-"`

	// CustomParams is a map of custom parameters that will be passed to the ZenRows Scraper API. These parameters will be passed as query
	// parameters in the request, and can be used to pass new features or options that are not available in the standard parameters.
	CustomParams map[string]string `json:"custom_params,omitempty" structs:"-" schema:"-"`
}

RequestParameters represents the parameters that can be passed to the ZenRows Scraper API when making a request to modify the behavior of the scraping engine.

See https://docs.zenrows.com/scraper-api/api-reference for more information.

func ParseQueryRequestParameters ¶

func ParseQueryRequestParameters(query url.Values) (*RequestParameters, error)

ParseQueryRequestParameters parses the provided url.Values object and returns a RequestParameters object, or an error if the parsing fails.

func (*RequestParameters) ToURLValues ¶

func (p *RequestParameters) ToURLValues() url.Values

ToURLValues converts the RequestParameters to a url.Values object

func (*RequestParameters) Validate ¶

func (p *RequestParameters) Validate() error

type ResourceType ¶

type ResourceType string

const (
	ResourceTypeEventSource ResourceType = "eventsource"
	ResourceTypeFetch       ResourceType = "fetch"
	ResourceTypeFont        ResourceType = "font"
	ResourceTypeImage       ResourceType = "image"
	ResourceTypeManifest    ResourceType = "manifest"
	ResourceTypeMedia       ResourceType = "media"
	ResourceTypeOther       ResourceType = "other"
	ResourceTypeScript      ResourceType = "script"
	ResourceTypeStylesheet  ResourceType = "stylesheet"
	ResourceTypeTextTrack   ResourceType = "texttrack"
	ResourceTypeWebSocket   ResourceType = "websocket"
	ResourceTypeXHR         ResourceType = "xhr"
)

type Response ¶

type Response struct {
	// RawResponse is the original `*http.Response` object.
	RawResponse *http.Response
	// contains filtered or unexported fields
}

Response struct holds response values of executed requests.

func (*Response) Body ¶

func (r *Response) Body() []byte

Body method returns the HTTP response as `[]byte` slice for the executed request.

func (*Response) Error ¶

func (r *Response) Error() error

Error method returns the error message of the HTTP response if any.

func (r *Response) Header() http.Header

Header method returns the response headers

func (*Response) IsError ¶

func (r *Response) IsError() bool

IsError method returns true if HTTP status `code >= 400` otherwise false.

func (*Response) IsSuccess ¶

func (r *Response) IsSuccess() bool

IsSuccess method returns true if HTTP status `code >= 200 and <= 299` otherwise false.

func (*Response) Problem ¶

func (r *Response) Problem() *problem.Problem

Problem method returns the problem description of the HTTP response if any.

func (*Response) ReceivedAt ¶

func (r *Response) ReceivedAt() time.Time

ReceivedAt method returns the time we received a response from the server for the request.

func (*Response) Size ¶

func (r *Response) Size() int64

Size method returns the HTTP response size in bytes.

func (*Response) Status ¶

func (r *Response) Status() string

Status method returns the HTTP status string for the executed request.

Example: 200 OK

func (*Response) StatusCode ¶

func (r *Response) StatusCode() int

StatusCode method returns the HTTP status code for the executed request.

Example: 200

func (*Response) String ¶

func (r *Response) String() string

String method returns the body of the HTTP response as a `string`. It returns an empty string if it is nil or the body is zero length.

func (*Response) TargetCookies ¶

func (r *Response) TargetCookies() []*http.Cookie

TargetCookies method to returns all the response cookies that the target page has set, if any.

func (*Response) TargetHeaders ¶

func (r *Response) TargetHeaders() http.Header

TargetHeaders method to returns all the response headers that the target page has set, if any. ZenRows Scraper API encodes these headers with a "Z-" prefix, so this method filters out all headers that do not have this prefix.

To get all the headers, see the [Response.Headers] field.

func (*Response) Time ¶

func (r *Response) Time() time.Duration

Time method returns the duration of HTTP response time from the request we sent and received a request.

See Response.ReceivedAt to know when the client received a response.

type ResponseType ¶

type ResponseType string

ResponseType represents the type of response that the ZenRows Scraper API should return.

const (
	ResponseTypeMarkdown  ResponseType = "markdown"
	ResponseTypePlainText ResponseType = "plaintext"
	ResponseTypePDF       ResponseType = "pdf"
)

type ScreenshotFormat ¶

type ScreenshotFormat string

const (
	ScreenshotFormatPNG  ScreenshotFormat = "png"
	ScreenshotFormatJPEG ScreenshotFormat = "jpeg"
)

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
nextversion
examples
concurrency
retries
pkg
problem
version Package version provides a location to set the release versions for all packages to consume, without creating import cycles.	Package version provides a location to set the release versions for all packages to consume, without creating import cycles.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

ZenRows Scraper API Go SDK

Introduction

Table of Contents

Installation

Getting Started

Usage

Client Initialization

Sending Requests

GET Requests

POST or PUT Requests

Custom Request Parameters

Handling Responses

Example

Configuration Options

Error Handling

Examples

Concurrency

Retrying

Contributing

License

Documentation ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

Types ¶

type Client ¶

func NewClient ¶

func (*Client) Get ¶

func (*Client) Post ¶

func (*Client) Put ¶

func (*Client) Scrape ¶

type IClient ¶

type InvalidHTTPMethodError ¶

func (InvalidHTTPMethodError) Error ¶

type InvalidParameterError ¶

func (InvalidParameterError) Error ¶

type InvalidTargetURLError ¶

func (InvalidTargetURLError) Error ¶

func (InvalidTargetURLError) Unwrap ¶

type NotConfiguredError ¶

func (NotConfiguredError) Error ¶

type Option ¶

func WithAPIKey ¶

func WithBaseURL ¶

func WithMaxConcurrentRequests ¶

func WithMaxRetryCount ¶

func WithRetryMaxWaitTime ¶

func WithRetryWaitTime ¶

type OutputType ¶

type RequestParameters ¶

func ParseQueryRequestParameters ¶

func (*RequestParameters) ToURLValues ¶

func (*RequestParameters) Validate ¶

type ResourceType ¶

type Response ¶

func (*Response) Body ¶

func (*Response) Error ¶

func (*Response) Header ¶

func (*Response) IsError ¶

func (*Response) IsSuccess ¶

func (*Response) Problem ¶

func (*Response) ReceivedAt ¶

func (*Response) Size ¶

func (*Response) Status ¶

func (*Response) StatusCode ¶

func (*Response) String ¶

func (*Response) TargetCookies ¶

func (*Response) TargetHeaders ¶

func (*Response) Time ¶

type ResponseType ¶

type ScreenshotFormat ¶

Source Files ¶

Directories ¶