isbot

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 18, 2022 License: MIT Imports: 4 Imported by: 7

README

Go library to detect bots based on the HTTP request. A "bot" is defined as any request that isn't a regular browser request initiated by the user. This includes things like web crawlers, but also stuff like "preview" renderers and the like.

Bot() accepts a http.Request since it looks at all information, not just the User-Agent. You can use UserAgent() if you just have a User-Agent, but it's highly recommended to use Bot().

Import as zgo.at/isbot; API docs: https://godocs.io/zgo.at/isbot

There is a command-line tool in cmd/isbot to check if User-Agents are bots:

$ isbot 'Mozilla/5.0 (X11; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0' 'Wget/1.13.4 (linux-gnu)'
false (1: NoBotNoMatch) ← Mozilla/5.0 (X11; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0
true  (4: BotClientLibrary) ← Wget/1.13.4 (linux-gnu)

It's not 100% reliable, and there are some known cases where it gets things wrong. See isbot_test.go for a list of test cases.

The performance is pretty good; turns out that running a few string.Contains() is loads faster than a (bot|crawler|search|...) regexp.

Documentation

Overview

Package isbot attempts to detect HTTP bots.

A "bot" is defined as any request that isn't a regular browser request initiated by the user. This includes things like web crawlers, but also stuff like "preview" renderers and the like.

Index

Constants

View Source
const (
	NoBotKnown   = 0 // Known to not be a bot.
	NoBotNoMatch = 1 // None of the rules matches, so probably not a bot.
)

Not bots.

View Source
const (
	BotPrefetch      = 2 // Prefetch algorithm
	BotLink          = 3 // User-Agent contained an URL.
	BotClientLibrary = 4 // Known client library.
	BotKnownBot      = 5 // Known bot.
	BotBoty          = 6 // User-Agent string looks "boty".
	BotShort         = 7 // User-Agent is short of strangely formatted.
)

Bots identified by User-Agent.

View Source
const (
	BotRangeAWS          = 8  // AWS cloud
	BotRangeDigitalOcean = 9  // Digital Ocean
	BotRangeServersCom   = 10 // servers.com
	BotRangeGoogleCloud  = 11 // Google Cloud
	BotRangeHetzner      = 12 // hetzner.de
)

Bots identified by IP.

View Source
const (
	BotJSPhanton   = 150 // Phantom headless browser.
	BotJSNightmare = 151 // Nightmare headless browser.
	BotJSSelenium  = 152 // Selenium headless browser.
	BotJSWebDriver = 153 // Generic WebDriver-based headless browser.
)

These are never set by isbot, but can be used to send signals from JS; for example:

var is_bot = function() {
    var w = window, d = document
    if (w.callPhantom || w._phantom || w.phantom)
        return 150
    if (w.__nightmare)
        return 151
    if (d.__selenium_unwrapped || d.__webdriver_evaluate || d.__driver_evaluate)
        return 152
    if (navigator.webdriver)
        return 153
    return 0
}

Variables

This section is empty.

Functions

func Is

func Is(r Result) bool

Is this constant a bot?

func IsNot

func IsNot(r Result) bool

IsNot is the inverse of Is().

func IsUserAgent

func IsUserAgent(r Result) bool

IsUserAgent reports if this is considered a bot because of the User-Agent header.

func Prefetch

func Prefetch(h http.Header) bool

Prefetch checks if this request is a browser "pre-fetch" request.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Link_prefetching_FAQ

Types

type Result

type Result uint8

func Bot

func Bot(r *http.Request) Result

Bot checks if this HTTP request looks like a bot.

It returns one of the constants as the reason we think this is a bot.

This assumes that r.RemoteAddr is set to the real IP and does not check X-Forwarded-For or X-Real-IP.

Note that both 0 and 1 may indicate that it's *not* a bot; use Is() and IsNot() to check.

func IPRange

func IPRange(addr string) Result

IPRange checks if this IP address is from a range that should normally never send browser requests, such as AWS and other cloud providers.

func UserAgent

func UserAgent(ua string) Result

UserAgent checks if this User-Agent header looks like a bot.

It returns one of the constants as the reason we think this is a bot.

func (Result) String

func (r Result) String() string

Directories

Path Synopsis
cmd
isbot
Command isbot checks if a User-Agent is a bot.
Command isbot checks if a User-Agent is a bot.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL