recon

package module
v0.0.0-...-3366d8a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 25, 2023 License: MIT Imports: 16 Imported by: 1

README

recon

GoDoc Go Report Card CI

Package recon is a library to retrieve HTML documents from the Internet and extract OpenGraph-related information from the page.

Documentation

Documentation is available on GoDoc.

Documentation

Overview

Package recon scrapes URLs for OpenGraph information.

Index

Constants

This section is empty.

Variables

View Source
var DefaultImageLookupTimeout = 10 * time.Second

DefaultImageLookupTimeout is the maximum amount of time recon will spend downloading and analyzing images

View Source
var OptimalAspectRatio = 1.91

OptimalAspectRatio is the target aspect ratio that recon favors when looking at images

Functions

This section is empty.

Types

type Image

type Image struct {
	URL         string  `json:"url"`
	Type        string  `json:"type"`
	Width       int     `json:"width"`
	Height      int     `json:"height"`
	Alt         string  `json:"alt"`
	AspectRatio float64 `json:"aspectRatio"`
	Preferred   bool    `json:"preferred,omitempty"`
}

Image contains information about parsed images on the page

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

Parser is the client object and holds the relevant information needed when parsing a URL

func NewParser

func NewParser() *Parser

NewParser returns a new Parser object

func (*Parser) Parse

func (p *Parser) Parse(url string) (Result, error)

Parse takes a url and attempts to parse it.

func (*Parser) WithClient

func (p *Parser) WithClient(client *http.Client) *Parser

WithClient allows the user to specify a custom HTTP client that the parser will use.

func (*Parser) WithHeaders

func (p *Parser) WithHeaders(h http.Header) *Parser

WithHeaders allows the user to set the HTTP request headers

func (*Parser) WithImageLookupTimeout

func (p *Parser) WithImageLookupTimeout(t time.Duration) *Parser

WithImageLookupTimeout allows the user to set the maximum amount of time recon will spend parsing images.

func (*Parser) WithTokenMaxBuffer

func (p *Parser) WithTokenMaxBuffer(s int) *Parser

WithTokenMaxBuffer limits the amount of memory used by the HTML tokenizer.

type Result

type Result struct {
	// URL is either the URL as-passed or the defined URL (via og:url) if present
	URL string `json:"url"`

	// Host is the domain of the URL as-passed or the defined URL if present
	Host string `json:"host"`

	// Site is the name of the site as defined via og:site_name or site_name
	Site string `json:"site_name"`

	// Title is the title of the page as defined via og:title or title
	Title string `json:"title"`

	// Type is the type of the page (article, video, etc.) as defined via og:type or type.
	Type string `json:"type"`

	// Description is the description of the page as defined via og:description or description.
	Description string `json:"description"`

	// Author is the author of the page as defined via og:author or author.
	Author string `json:"author"`

	// Publisher is the publisher of the page as defined via og:publisher or publisher.
	Publisher string `json:"publisher"`

	// Images is the collection of images parsed from the page using either og:image meta tags or <img> tags.
	Images []Image `json:"images"`

	// Scraped is the time when the page was scraped (or the time Parse was run).
	Scraped time.Time `json:"scraped"`
}

Result is what comes back from a Parse

func Parse

func Parse(url string) (Result, error)

Parse takes a url and attempts to parse it. This function instanciates a fresh Parser each time it's invoked.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL