isbot

package module

v1.0.0 Latest Latest Go to latest Published: Feb 18, 2022 License: MIT Imports: 4 Imported by: 7

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/arp242/isbot

Links

Open Source Insights

README ¶

Go library to detect bots based on the HTTP request. A "bot" is defined as any request that isn't a regular browser request initiated by the user. This includes things like web crawlers, but also stuff like "preview" renderers and the like.

Bot() accepts a http.Request since it looks at all information, not just the User-Agent. You can use UserAgent() if you just have a User-Agent, but it's highly recommended to use Bot().

Import as zgo.at/isbot; API docs: https://godocs.io/zgo.at/isbot

There is a command-line tool in cmd/isbot to check if User-Agents are bots:

$ isbot 'Mozilla/5.0 (X11; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0' 'Wget/1.13.4 (linux-gnu)'
false (1: NoBotNoMatch) ← Mozilla/5.0 (X11; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0
true  (4: BotClientLibrary) ← Wget/1.13.4 (linux-gnu)

It's not 100% reliable, and there are some known cases where it gets things wrong. See isbot_test.go for a list of test cases.

The performance is pretty good; turns out that running a few string.Contains() is loads faster than a (bot|crawler|search|...) regexp.

Documentation ¶

Overview ¶

Package isbot attempts to detect HTTP bots.

A "bot" is defined as any request that isn't a regular browser request initiated by the user. This includes things like web crawlers, but also stuff like "preview" renderers and the like.

Constants ¶

View Source

const (
	NoBotKnown   = 0 // Known to not be a bot.
	NoBotNoMatch = 1 // None of the rules matches, so probably not a bot.
)

Not bots.

View Source

const (
	BotPrefetch      = 2 // Prefetch algorithm
	BotLink          = 3 // User-Agent contained an URL.
	BotClientLibrary = 4 // Known client library.
	BotKnownBot      = 5 // Known bot.
	BotBoty          = 6 // User-Agent string looks "boty".
	BotShort         = 7 // User-Agent is short of strangely formatted.
)

Bots identified by User-Agent.

View Source

const (
	BotRangeAWS          = 8  // AWS cloud
	BotRangeDigitalOcean = 9  // Digital Ocean
	BotRangeServersCom   = 10 // servers.com
	BotRangeGoogleCloud  = 11 // Google Cloud
	BotRangeHetzner      = 12 // hetzner.de
)

Bots identified by IP.

View Source

const (
	BotJSPhanton   = 150 // Phantom headless browser.
	BotJSNightmare = 151 // Nightmare headless browser.
	BotJSSelenium  = 152 // Selenium headless browser.
	BotJSWebDriver = 153 // Generic WebDriver-based headless browser.
)

These are never set by isbot, but can be used to send signals from JS; for example:

var is_bot = function() {
    var w = window, d = document
    if (w.callPhantom || w._phantom || w.phantom)
        return 150
    if (w.__nightmare)
        return 151
    if (d.__selenium_unwrapped || d.__webdriver_evaluate || d.__driver_evaluate)
        return 152
    if (navigator.webdriver)
        return 153
    return 0
}