xurls

package module
Version: v2.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 24, 2021 License: BSD-3-Clause Imports: 3 Imported by: 75

README

xurls

Go Reference

Extract urls from text using regular expressions. Requires Go 1.15 or later.

import "mvdan.cc/xurls/v2"

func main() {
	rxRelaxed := xurls.Relaxed()
	rxRelaxed.FindString("Do gophers live in golang.org?")  // "golang.org"
	rxRelaxed.FindString("This string does not have a URL") // ""

	rxStrict := xurls.Strict()
	rxStrict.FindAllString("must have scheme: http://foo.com/.", -1) // []string{"http://foo.com/"}
	rxStrict.FindAllString("no scheme, no match: foo.com", -1)       // []string{}
}

Since API is centered around regexp.Regexp, many other methods are available, such as finding the byte indexes for all matches.

Note that calling the exposed functions means compiling a regular expression, so repeated calls should be avoided.

cmd/xurls

To install the tool globally:

cd $(mktemp -d); go mod init tmp; GO111MODULE=on go get mvdan.cc/xurls/v2/cmd/xurls
$ echo "Do gophers live in http://golang.org?" | xurls
http://golang.org

Documentation

Overview

Package xurls extracts urls from plain text using regular expressions.

Example
rx := xurls.Relaxed()
fmt.Println(rx.FindString("Do gophers live in http://golang.org?"))
fmt.Println(rx.FindAllString("foo.com is http://foo.com/.", -1))
Output:

http://golang.org
[foo.com http://foo.com/]

Index

Examples

Constants

This section is empty.

Variables

View Source
var AnyScheme = `([a-zA-Z][a-zA-Z.\-+]*://|` + anyOf(SchemesNoAuthority...) + `:)`

AnyScheme can be passed to StrictMatchingScheme to match any possibly valid scheme, and not just the known ones.

View Source
var PseudoTLDs = []string{
	`bit`,
	`example`,
	`exit`,
	`gnu`,
	`i2p`,
	`invalid`,
	`local`,
	`localhost`,
	`test`,
	`zkey`,
}

PseudoTLDs is a sorted list of some widely used unofficial TLDs.

Sources:

* https://en.wikipedia.org/wiki/Pseudo-top-level_domain
* https://en.wikipedia.org/wiki/Category:Pseudo-top-level_domains
* https://tools.ietf.org/html/draft-grothoff-iesg-special-use-p2p-names-00
* https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml
View Source
var Schemes = []string{}/* 343 elements not displayed */

Schemes is a sorted list of all IANA assigned schemes.

Source:

https://www.iana.org/assignments/uri-schemes/uri-schemes-1.csv
View Source
var SchemesNoAuthority = []string{
	`bitcoin`,
	`cid`,
	`file`,
	`magnet`,
	`mailto`,
	`mid`,
	`sms`,
	`tel`,
	`xmpp`,
}

SchemesNoAuthority is a sorted list of some well-known url schemes that are followed by ":" instead of "://". The list includes both officially registered and unofficial schemes.

View Source
var SchemesUnofficial = []string{
	`jdbc`,
	`postgres`,
	`postgresql`,
	`slack`,
	`zoommtg`,
	`zoomus`,
}

SchemesUnofficial is a sorted list of some well-known url schemes which aren't officially registered just yet. They tend to correspond to software.

Mostly collected from https://en.wikipedia.org/wiki/List_of_URI_schemes#Unofficial_but_common_URI_schemes.

View Source
var TLDs = []string{}/* 1508 elements not displayed */

TLDs is a sorted list of all public top-level domains.

Sources:

* https://data.iana.org/TLD/tlds-alpha-by-domain.txt
* https://publicsuffix.org/list/effective_tld_names.dat

Functions

func Relaxed

func Relaxed() *regexp.Regexp

Relaxed produces a regexp that matches any URL matched by Strict, plus any URL with no scheme or email address.

func Strict

func Strict() *regexp.Regexp

Strict produces a regexp that matches any URL with a scheme in either the Schemes or SchemesNoAuthority lists.

func StrictMatchingScheme

func StrictMatchingScheme(exp string) (*regexp.Regexp, error)

StrictMatchingScheme produces a regexp similar to Strict, but requiring that the scheme match the given regular expression. See AnyScheme too.

Types

This section is empty.

Directories

Path Synopsis
cmd
generate

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL