urlresolver

package module
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 19, 2024 License: MIT Imports: 17 Imported by: 4

README

urlresolver

Documentation Build status Code coverage Go report card

A golang package that "resolves" a given URL by issuing a GET request, following any redirects, canonicalizing the final URL, and attempting to extract the title from the final response body.

Methodology

Resolving

A URL is resolved by issuing a GET request and following any redirects until a non-30x response is received.

Canonicalizing

The final URL is aggressively canonicalized using a combination of PuerkitoBio/purell and some manual heuristics for removing unnecessary query params (e.g. utm_* tracking params), normalizing case (e.g. twitter.com/Thresholderbot and twitter.com/thresholderbot are the same).

Canonicalization is optimized for URLs that are shared on social media.

Security

TL;DR: Use safedialer.Control in the transport's dialer to block attempts to resolve URLs pointing at internal, private IP addresses.

Exposing functionality like this on the internet can be dangerous, because it could theoretically allow a malicious client to discover information about your internal network by asking it to resolve URLs whose DNS points at private IP addresses.

The dangers, along with a golang-specific mitigation, are outlined in Andrew Ayer's excellent "Preventing Server Side Request Forgery in Golang" blog post.

To mitigate that danger, users are strongly encouraged to use safedialer.Control as the Control function in the dialer used by the transport given to urlresolver.New.

See github.com/mccutchen/urlresolverapi for a productionized example, deployed at https://urlresolver.com.

Documentation

Index

Constants

This section is empty.

Variables

NormalizationFlags defines the normalization flags the purell package will use during canonicalization.

See https://godoc.org/github.com/PuerkitoBio/purell#NormalizationFlags

Functions

func Canonicalize

func Canonicalize(u *url.URL) string

Canonicalize filters unnecessary query params and then normalizes a URL, ensuring consistent case, encoding, sorting of params, etc.

Types

type Interface

type Interface interface {
	Resolve(context.Context, string) (Result, error)
}

Interface defines the interface for a URL resolver.

type Resolver

type Resolver struct {
	// contains filtered or unexported fields
}

Resolver resolves URLs.

func New

func New(transport http.RoundTripper, timeout time.Duration) *Resolver

New creates a new Resolver that uses the given transport to make HTTP requests and applies the given timeout to the overall process (including any redirects that must be followed).

func (*Resolver) Resolve

func (r *Resolver) Resolve(ctx context.Context, givenURL string) (Result, error)

Resolve resolves the given URL by following any redirects, canonicalizing the final URL, and attempting to extract the title from the final response body.

type Result

type Result struct {
	ResolvedURL      string
	Title            string
	IntermediateURLs []string
}

Result is the result of resolving a URL.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL