scraper

package
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 26, 2023 License: MIT Imports: 14 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ExtractCSRF

func ExtractCSRF(resp *http.Response) (csrf string)

ExtractCSRF takes in an http response, extracts the CSRF token from the body of the response and returns it as a string. The token is extracted using regular expressions to match specific patterns in the HTML body of the response.

csrf := ExtractCSRF(resp)
fmt.Println(csrf)

func FetchAndSubmitForm

func FetchAndSubmitForm(client *http.Client, urlStr string, setValues func(values url.Values)) (*http.Response, error)

FetchAndSubmitForm takes in an http client, a url string, and a function that sets values for the form. The function fetches the form from the given url, parses it, and allows the setValues function to fill out the form. The form is then submitted and the response is returned, along with any error that may have occurred.

resp, err := FetchAndSubmitForm(client, "https://example.com/form", func(values url.Values) {
	values.Set("username", "john")
	values.Set("password", "password123")
})
if err != nil {
	fmt.Println(err)
}

func NewTransport

func NewTransport(tripper http.RoundTripper) http.RoundTripper

NewTransport returns a new http.RoundTripper that wraps the provided http.RoundTripper and sets the TLSClientConfig to the value returned by getTLSConfig(). It also sets the userAgent to a specific value.

tripper := &http.Transport{}
rt := NewTransport(tripper)
req, _ := http.NewRequest("GET", "https://example.com", nil)
rt.RoundTrip(req)

func ParseForms

func ParseForms(node *html.Node) (forms []htmlForm)

ParseForms takes in an html node and returns a slice of htmlForm structs. Each struct represents an HTML form found in the node, including its action, method, and input values.

forms := ParseForms(node)
for _, form := range forms {
	fmt.Println(form.Action)
	fmt.Println(form.Method)
	fmt.Println(form.Values)
}

Types

type Client

type Client struct {
	Client      *http.Client // http client used to make requests to the server
	BaseURL     *url.URL     // base url of the server
	Creds       *Credentials // credentials used for authentication
	MaxFileSize int64        // maximum file size allowed
}

Client struct stores the http client, base url and credentials used to communicate with the server

func NewClient

func NewClient(transport http.RoundTripper) *Client

NewClient returns a new instance of the Client struct with a specified transport. If no transport is provided, a default transport with a long timeout will be used. The client also uses a cookie jar and sets a default max file size of 25MB.

transport := &http.Transport{}
client := NewClient(transport)

func (*Client) DoRequest

func (c *Client) DoRequest(req *http.Request) (*http.Response, error)

DoRequest takes in an http request and sends it to the specified client. If the response status code is not http.StatusOK, the request will be retried up to 5 times with a rate limit of 1 request per second. If the final response status code is between http.StatusBadRequest and http.StatusNetworkAuthenticationRequired, an error will be returned.

resp, err := client.DoRequest(req)
if err != nil {
	fmt.Println(err)
}

func (*Client) GetDoc

func (c *Client) GetDoc(urlStr string, a ...interface{}) (*goquery.Document, error)

GetDoc takes in a url string and an optional list of interfaces, formats the url and sends a GET request. The response body is then parsed into a goquery document and returned, along with any error that may have occurred.

doc, err := client.GetDoc("https://example.com/%v", "path")
if err != nil {
	fmt.Println(err)
}

func (*Client) GetFile added in v0.2.0

func (c *Client) GetFile(urlStr string, a ...interface{}) (*http.Response, error)

GetFile takes in a url string and an optional list of interfaces, formats the url and sends a GET request. The response is returned, along with any error that may have occurred.

resp, err := client.GetFile("https://example.com/%v", "path")
if err != nil {
	fmt.Println(err)
}

func (*Client) GetJson

func (c *Client) GetJson(urlStr string, a ...interface{}) (*http.Response, error)

GetJson takes in a url string and an optional list of interfaces, formats the url and sends a GET request. The response is returned, along with any error that may have occurred.

resp, err := client.GetJson("https://example.com/%v", "path")
if err != nil {
	fmt.Println(err)
}

type Credentials

type Credentials struct {
	Username string
	Password string
	Token    string
}

Credentials struct stores the username and password used for authentication

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL