README

Go library to detect bots based on the HTTP request. A "bot" is defined as any request that isn't a regular browser request initiated by the user. This includes things like web crawlers, but also stuff like "preview" renderers and the like.

Bot() accepts a http.Request since it looks at all information, not just the User-Agent. You can use UserAgent() if you just have a User-Agent, but it's highly recommended to use Bot().

Import as zgo.at/isbot; API docs: https://pkg.go.dev/zgo.at/isbot

It's not 100% reliable, and there are some known cases where it gets things wrong. See isbot_test.go for a list of test cases.

The performance is pretty good; turns out that running a few string.Contains() is loads faster than a (bot|crawler|search|...) regexp.

Expand ▾ Collapse ▴

Documentation

Overview

    Package isbot attempts to detect HTTP bots.

    A "bot" is defined as any request that isn't a regular browser request initiated by the user. This includes things like web crawlers, but also stuff like "preview" renderers and the like.

    Index

    Constants

    View Source
    const (
    	NoBotKnown       = 0 // Known to not be a bot.
    	NoBotNoMatch     = 1 // None of the rules matches, so probably not a bot.
    	BotPrefetch      = 2 // Prefetch algorithm
    	BotLink          = 3 // User-Agent contained an URL.
    	BotClientLibrary = 4 // Known client library.
    	BotKnownBot      = 5 // Known bot.
    	BotBoty          = 6 // User-Agent string looks "boty".
    	BotShort         = 7 // User-Agent is short of strangely formatted.
    )
    View Source
    const (
    	BotRangeAWS          = 8  // AWS cloud
    	BotRangeDigitalOcean = 9  // Digital Ocean
    	BotRangeServersCom   = 10 // servers.com
    	BotRangeGoogleCloud  = 11 // Google Cloud
    	BotRangeHetzner      = 12 // hetzner.de
    )
    View Source
    const (
    	BotJSPhanton   = 150 // Phantom headless browser.
    	BotJSNightmare = 151 // Nightmare headless browser.
    	BotJSSelenium  = 152 // Selenium headless browser.
    	BotJSWebDriver = 153 // Generic WebDriver-based headless browser.
    )

      These are never set by isbot, but can be used to send signals from JS; for example:

      var is_bot = function() {
          var w = window, d = document
          if (w.callPhantom || w._phantom || w.phantom)
              return 150
          if (w.__nightmare)
              return 151
          if (d.__selenium_unwrapped || d.__webdriver_evaluate || d.__driver_evaluate)
              return 152
          if (navigator.webdriver)
              return 153
          return 0
      }
      

      Variables

      This section is empty.

      Functions

      func Bot

      func Bot(r *http.Request) uint8

        Bot checks if this HTTP request looks like a bot.

        It returns one of the constants as the reason we think this is a bot.

        Note: this assumes that r.RemoteAddr is set to the real IP, and does not check X-Forwarded-For or X-Real-IP.

        func IPRange

        func IPRange(addr string) uint8

          IPRange checks if this IP address is from a range that should normally never send browser requests, such as AWS and other cloud providers.

          func Is

          func Is(r uint8) bool

            Is this constant a bot?

            func IsNot

            func IsNot(r uint8) bool

              IsNot is the inverse of Is().

              func IsUserAgent

              func IsUserAgent(r uint8) bool

                IsUserAgent reports if this is considered a bot because of the User-Agent header.

                func Prefetch

                func Prefetch(h http.Header) bool

                  Prefetch checks if this request is a browser "pre-fetch" request.

                  https://developer.mozilla.org/en-US/docs/Web/HTTP/Link_prefetching_FAQ

                  func UserAgent

                  func UserAgent(ua string) uint8

                    UserAgent checks if this User-Agent header looks like a bot.

                    It returns one of the constants as the reason we think this is a bot.

                    Types

                    This section is empty.