xurls

package module
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 25, 2017 License: BSD-3-Clause Imports: 1 Imported by: 106

README

xurls

GoDoc Travis

Extract urls from text using regular expressions.

go get -u github.com/mvdan/xurls
import "github.com/mvdan/xurls"

func main() {
	xurls.Relaxed.FindString("Do gophers live in golang.org?")
	// "golang.org"
	xurls.Strict.FindAllString("foo.com is http://foo.com/.", -1)
	// []string{"http://foo.com/"}
}

Relaxed is around five times slower than Strict since it does more work to find the URLs without relying on the scheme:

BenchmarkStrictEmpty-4           1000000              1885 ns/op
BenchmarkStrictSingle-4           200000              8356 ns/op
BenchmarkStrictMany-4             100000             22547 ns/op
BenchmarkRelaxedEmpty-4           200000              7284 ns/op
BenchmarkRelaxedSingle-4           30000             58557 ns/op
BenchmarkRelaxedMany-4             10000            130251 ns/op
cmd/xurls
go get -u github.com/mvdan/xurls/cmd/xurls
$ echo "Do gophers live in http://golang.org?" | xurls
http://golang.org

Documentation

Overview

Package xurls extracts urls from plain text using regular expressions.

Example
package main

import (
	"fmt"

	"github.com/mvdan/xurls"
)

func main() {
	fmt.Println(xurls.Relaxed.FindString("Do gophers live in http://golang.org?"))
	fmt.Println(xurls.Relaxed.FindAllString("foo.com is http://foo.com/.", -1))
}
Output:

http://golang.org
[foo.com http://foo.com/]

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	// Relaxed matches all the urls it can find.
	Relaxed = regexp.MustCompile(relaxed)
	// Strict only matches urls with a scheme to avoid false positives.
	Strict = regexp.MustCompile(strict)
)
View Source
var PseudoTLDs = []string{
	`bit`,
	`example`,
	`exit`,
	`gnu`,
	`i2p`,
	`invalid`,
	`local`,
	`localhost`,
	`test`,
	`zkey`,
}

PseudoTLDs is a sorted list of some widely used unofficial TLDs.

Sources:

View Source
var SchemesNoAuthority = []string{
	`bitcoin`,
	`file`,
	`magnet`,
	`mailto`,
	`sms`,
	`tel`,
	`xmpp`,
}

SchemesNoAuthority is a sorted list of some well-known url schemes that are followed by ":" instead of "://". Since these are more prone to false positives, we limit their matching.

View Source
var TLDs = []string{}/* 1554 elements not displayed */

TLDs is a sorted list of all public top-level domains.

Sources:

Functions

func StrictMatchingScheme added in v0.8.0

func StrictMatchingScheme(exp string) (*regexp.Regexp, error)

StrictMatchingScheme produces a regexp that matches urls like Strict but whose scheme matches the given regular expression.

Types

This section is empty.

Directories

Path Synopsis
cmd
xurls command
generate
regexgen command
tldsgen command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL