zenrows

package module
v0.4.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 11, 2023 License: MIT Imports: 7 Imported by: 0

README

go-zenrows

Go Reference Go Report Card codecov

go-zenrows is a Go client for the ZenRows API, allowing users to easily scrape web content.

Features

  • Scrape Web Content: Easily scrape content from any website using the ZenRows API.
  • Flexible Configuration: Comes with a default configuration but allows for customization.
  • Various Scrape Options: Customize your scraping with options like JS rendering, custom headers, session ID, and more.
  • Examples Included: A basic example is provided to help you get started quickly.

Installation

go get github.com/renatoaraujo/go-zenrows

Usage

Here's a basic example to get you started:

hc := &http.Client{
    Timeout: time.Duration(60) * time.Second,
}
client := zenrows.NewClient(hc).WithApiKey("YOUR_API_KEY")

result, err := client.Scrape(context.TODO(), "https://httpbin.org", zenrows.WithJSRender())
if err != nil {
    log.Fatalf("Failed to scrape the target: %v", err)
}

fmt.Println("Scraped Content:", result)

View the full example here.

Documentation

For a detailed list of all available functions and scrape options, refer to the official documentation:

Credits

License

The MIT License (MIT) - see LICENSE for more details

Documentation

Overview

Package zenrows provides utility functions to set scraping options for the ZenRows API. These functions help in configuring the request parameters for various scraping features and requirements.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ApplyParameters

func ApplyParameters(u *url.URL, params ...ScrapeOptions) *url.URL

ApplyParameters applies the chosen scraping options to a URL. It modifies the URL's query string based on the provided scraping options.

u: The target URL. params: The ScrapeOptions to be applied to the URL.

Types

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client ZenRow client

func NewClient

func NewClient(httpClient HttpClient) *Client

NewClient Initialise the client with given HttpClient interface

func (*Client) Scrape

func (c *Client) Scrape(ctx context.Context, targetURL string, params ...ScrapeOptions) (string, error)

Scrape fetches content from the specified targetURL using the ZenRows API.

The function constructs the API URL based on the provided targetURL and optional ScrapeOptions. It then sends a GET request to the ZenRows API and returns the scraped content as a string.

The function validates the provided targetURL to ensure it's a full URL with both a scheme and a host. It also checks if the 'js_instructions' parameter is set without enabling 'js_render', and returns an error if so.

For now only supports GET method

Parameters: - ctx: Context - targetURL: The URL of the website you want to scrape. - params: Optional parameters to customize the scraping process. Refer to ScrapeOptions for available options.

Returns: - A string containing the scraped content. - An error if there's any issue during the scraping process, such as invalid URLs, failed requests, or reading issues.

Example usage:

content, err := client.Scrape(context.Background(), "https://example.com", zenrows.WithJSRender(true))
if err != nil {
    log.Fatalf("Failed to scrape the target: %v", err)
}
fmt.Println("Scraped Content:", content)

For more details and examples, refer to the https://pkg.go.dev/github.com/renatoaraujo/go-zenrows and the example provided in the repository https://github.com/renatoaraujo/go-zenrows/blob/main/examples/example.go.

func (*Client) WithApiKey

func (c *Client) WithApiKey(key string) *Client

WithApiKey Configures the apikey

type ClientConfig

type ClientConfig struct {
	BaseURL string
	// contains filtered or unexported fields
}

ClientConfig Configuration with the key and base API URL

func DefaultConfig

func DefaultConfig() ClientConfig

DefaultConfig Generate default configuration -- currently only option but extensive for the future

func (*ClientConfig) ConfigCredentials

func (c *ClientConfig) ConfigCredentials(key string)

ConfigCredentials Adds the apikey to the configuration -- in case they change the format of the credentials it will be easier to implement here

type HttpClient

type HttpClient interface {
	Do(req *http.Request) (*http.Response, error)
}

HttpClient Http client able to perform request, can be http.Client or any other

type ScrapeOptions

type ScrapeOptions func(values url.Values)

ScrapeOptions defines functions that modify URL query values based on the chosen scraping options.

func WithAIAntiBot added in v0.3.0

func WithAIAntiBot() ScrapeOptions

WithAIAntiBot sets the anti-bot Some websites protect their content with anti-bot solutions such as Cloudfare, Akamai, or Datadome. Enable Anti-bot to bypass them easily without any hassle.

func WithAutoparse

func WithAutoparse(value bool) ScrapeOptions

WithAutoparse employs the auto-parser algorithm for the request, which extracts data from the page automatically.

value: A boolean to determine if the auto parser should be used.

func WithBlockResources

func WithBlockResources(value string) ScrapeOptions

WithBlockResources prevents specific resources from loading during the scrape request.

value: The types of resources to block.

func WithCSSExtractor

func WithCSSExtractor(value string) ScrapeOptions

WithCSSExtractor sets CSS Selectors to extract specific data from the HTML.

value: The desired CSS selectors.

func WithCustomHeaders

func WithCustomHeaders(value bool) ScrapeOptions

WithCustomHeaders allows custom headers to be added to the request.

value: A boolean indicating if custom headers are to be included.

func WithDevice

func WithDevice(value string) ScrapeOptions

WithDevice sets the user agent type (either desktop or mobile) for the request.

value: A string specifying the device type ("desktop" or "mobile").

func WithJSInstructions added in v0.2.0

func WithJSInstructions(value string) ScrapeOptions

WithJSInstructions provides JavaScript instructions for the scrape request. It automatically enables WithJSRender to ensure the correct execution of JavaScript instructions.

value: A JSON string representing the JavaScript instructions.

func WithJSONResponse

func WithJSONResponse(value bool) ScrapeOptions

WithJSONResponse configures the request to return content in JSON format, including any XHR or Fetch requests made.

value: A boolean to determine if the response should be in JSON format.

func WithJSRender

func WithJSRender() ScrapeOptions

WithJSRender enables JavaScript rendering for the scrape request. Consumes 5 credits per request.

func WithOriginalStatus

func WithOriginalStatus(value bool) ScrapeOptions

WithOriginalStatus configures the request to return the status code as provided by the website.

value: A boolean determining if the original status code should be returned.

func WithPremiumProxy

func WithPremiumProxy() ScrapeOptions

WithPremiumProxy enables the use of premium proxies for the request. This makes the request less detectable and consumes 10-25 credits per request.

func WithProxyCountry

func WithProxyCountry(value string) ScrapeOptions

WithProxyCountry specifies the geolocation of the IP for the request. Note: Only applicable for Premium Proxies.

value: The desired country code for the proxy.

func WithResolveCaptcha

func WithResolveCaptcha(value bool) ScrapeOptions

WithResolveCaptcha integrates a CAPTCHA solver for the request, enabling automatic solving of CAPTCHAs on the page.

value: A boolean to determine if the CAPTCHA solver should be used.

func WithSessionID

func WithSessionID(sessionID int) ScrapeOptions

WithSessionID sets the Session ID number for the scrape request. This allows the use of the same IP for each API Request for up to 10 minutes.

sessionID: An integer representing the Session ID.

func WithWait

func WithWait(value int) ScrapeOptions

WithWait introduces a fixed delay before the content is returned.

value: An integer specifying the wait time in milliseconds.

func WithWaitFor

func WithWaitFor(value string) ScrapeOptions

WithWaitFor delays the request until a specific CSS Selector is loaded in the DOM.

value: A string specifying the CSS Selector to wait for.

func WithWindowHeight

func WithWindowHeight(value int) ScrapeOptions

WithWindowHeight defines the browser window height for the request.

value: The desired window height in pixels.

func WithWindowWidth

func WithWindowWidth(value int) ScrapeOptions

WithWindowWidth defines the browser window width for the request.

value: The desired window width in pixels.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL