basecrawler

package module
v0.0.0-...-39d89d1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 20, 2023 License: MIT Imports: 16 Imported by: 0

README

BMS Basecrawler GoDoc Build Status

This repository contains the implementation of a crawler framework which we call basecrawler. You can use this crawler as a base and extend it with just the protocol implementation of a P2P botnet (see short example below).

Example Implementation

A very shortened crawler implementation (without any error handling, logging, etc) for the testnet protocol would be:

func (ci *crawlerImplementation) SendPeerRequest(ctx context.Context, config map[string]any, conn *net.UDPConn, addr *net.UDPAddr) {
	conn.WriteTo([]byte("peer-request"), addr)
}

func (ci *crawlerImplementation) ReadReply(ctx context.Context, config map[string]any, msg []byte, addr *net.UDPAddr) (basecrawler.CrawlResult, []string, []bmsclient.Edge, []bmsclient.BotReply) {
	msgParts := strings.Split(string(msg), ":")

	dstIP := net.ParseIP(msgParts[0])
	dstPort, _ := strconv.ParseUint(msgParts[1], 10, 16)

	edge := bmsclient.Edge{Timestamp: time.Now(), SrcIP: addr.IP, SrcPort: uint16(addr.Port), DstIP: dstIP, DstPort: uint16(dstPort)}
	reply := bmsclient.BotReply{Timestamp: time.Now(), IP: addr.IP, Port: uint16(addr.Port)}

	return basecrawler.BOT_REPLY, []string{string(msg)}, []bmsclient.Edge{edge}, []bmsclient.BotReply{reply}
}

Configuration

The basecrawler module provides a function that a crawler implementation can use to configure the crawler from environment variables. The following table depicts all available variables.

Environment Variable Example Description
LOG_LEVEL info Log level to start the crawler with. Has to be one of debug, info, warn, error. If unset, the crawler won't output anything.
DISCOVERY_INTERVAL 30 The interval (in seconds) that the crawler will wait before trying to crawl a bot again in the discovery loop. Defaults to 300s if unset.
TRACKING_INTERVAL 30 The interval (in seconds) that the crawler will wait before trying to crawl a bot again in the tracking loop. Defaults to 300s if unset.
DISCOVERY_REMOVE_INTERVAL 300 The interval (in seconds) that a bot is allowed to be unresponsive or benign before the crawler will remove it from the discovery loop. Defaults to 900s if unset.
TRACKING_REMOVE_INTERVAL 300 The interval (in seconds) that a bot is allowed to be unresponsive or benign before the crawler will remove it from the tracking loop. Defaults to 900s if unset.
DISCOVERY_WORKER_COUNT 1000 The number of worker Go routines the crawler will start for crawling bots in the discovery loop. Defaults to 10000 if unset.
TRACKING_WORKER_COUNT 1000 The number of worker Go routines the crawler will start for crawling bots in the tracking loop. Defaults to 10000 if unset.
FINDPEER_WORKER_COUNT 10 The number of worker Go routines the crawler will start for crawling bots in the find-peer loop. Defaults to 100 if unset, will be ignored if the crawler implementation does not implement the find-peer loop.
BMS_SERVER localhost:8083 The BMS server to send the crawled bot replies, edges and failed tries to. If unset, the crawler won't connect to a BMS server; has to be provided together with BMS_MONITOR, BMS_AUTH_TOKEN and BMS_BOTNET.
BMS_MONITOR some-monitor The monitor ID to authenticate as. Has to exist in the database of the used BMS server. If unset, the crawler won't connect to a BMS server; has to be provided together with BMS_SERVER, BMS_AUTH_TOKEN and BMS_BOTNET.
BMS_AUTH_TOKEN AAAA/some+example+token/AAAAAAAAAAAAAAAAAAA= The auth token to use for authentication. Has to be exactly 32 base64-encoded bytes. If unset, the crawler won't connect to a BMS server; has to be provided together with BMS_SERVER, BMS_MONITOR and BMS_BOTNET.
BMS_BOTNET some-botnet The botnet ID to send data for. Has to exist in the database of the used BMS server. If unset, the crawler won't connect to a BMS server; has to be provided together with BMS_SERVER, BMS_MONITOR and BMS_AUTH_TOKEN.
BMS_CAMPAIGN some-campaign The campaign ID to send data for. Has to exist in the database of the used BMS server. Is ignored if provided without BMS_SERVER, BMS_MONITOR, BMS_AUTH_TOKEN and BMS_BOTNET.
BMS_CRAWLER_PUBLIC_IP 198.51.100.1 The public IP address that the crawler is reachable under. Is ignored if provided without BMS_SERVER, BMS_MONITOR, BMS_AUTH_TOKEN and BMS_BOTNET.

All environment variables are optional. By default the crawler will start without a BMS connection and with defaults where necessary. Please note that especially the BMS options BMS_SERVER, BMS_MONITOR, BMS_AUTH_TOKEN and BMS_BOTNET have to be provided as a group (while BMS_CAMPAIGN and BMS_CRAWLER_PUBLIC_IP are optional, but can only used if the other four options are set).

Peerlist

The basecrawler module also provides a function to parse a CSV file (commonly peerlist.csv) so that it can be passed to the basecrawler constructor. The CSV file has just one column (IP address with port, e.g. 192.0.2.1:45678) without header.

Since not all crawlers need an up-to-date peerlist to bootstrap the crawler, it might not be needed to provide a peerlist for a crawler.

Documentation

Index

Examples

Constants

View Source
const (
	// The context key that is used to store as which loop something is executed in (e.g. discovery_loop)
	// Mostly useful for logging
	ContextKeyLoop contextKey = iota

	// The context key that is used to store as which worker ID is executing something
	// Mostly useful for logging
	ContextKeyWorkerID contextKey = iota
)

Variables

This section is empty.

Functions

func MustParsePeerList

func MustParsePeerList(path string) []string

MustParsePeerList is like ParsePeerList but will panic if it encounters an error.

func ParsePeerList

func ParsePeerList(path string) ([]string, error)

ParsePeerList takes a CSV file of peers (in format ip:port), checks that it's valid and transforms it to a string slice.

It can be used by crawler implementations to parse a peerlist needed to bootstrap a crawler in a common way.

Types

type BMSOption

type BMSOption func(*bmsConfig)

Functions which implement CrawlerOption can be passed to WithBMS as additional options.

func WithBMSCampaign

func WithBMSCampaign(campaign string) BMSOption

WithBMSCampaign can be used to set a campaign when sending results to BMS. Has to be passed to WithBMS, as it only can be used in combination with a BMS connection.

func WithBMSPublicIP

func WithBMSPublicIP(ip string) BMSOption

WithBMSCampaign can be used to set the public IP address of a crawler which will be written to the BMS database. Has to be passed to WithBMS, as it only can be used in combination with a BMS connection.

type BaseCrawler

type BaseCrawler struct {
	// contains filtered or unexported fields
}

The BaseCrawler struct represents a basecrawler instance.

To create a new instance, use NewCrawler.

func NewCrawler

func NewCrawler(botnetImplementation any, bootstrapPeers []string, options ...CrawlerOption) (*BaseCrawler, error)

NewCrawler creates a new crawler based on the basecrawler which embeds the given specific implementation.

The first parameter is the actual implementation of the botnet protocol. The passed struct has to either implement the TCPBotnetCrawler or the UDPBotnetCrawler interface.

You can make sure your struct implements e.g. the UDPBotnetCrawler interface at compile time by putting the following in your code:

// Make sure that someImplementation is implementing the UDPBotnetCrawler interface (at compile time)
var _ basecrawler.UDPBotnetCrawler = &someImplementation{}

The entries of the bootstrap peerlist have to be in format parsable by net.SplitHostPort (oftentimes it's simply ip:port).

You can pass optional config to the crawler as all further parameters. All options have to implement the CrawlerOption type (see CrawlerOption for available options). If you want to read optional crawler configuration from environment variables (in a common way), you can use CreateOptionsFromEnv (see the configuration section in the readme for possible environment variables).

Although calling this method will already start a BMS session (if configured with BMS), it will not start the actual crawling. To start it, call BaseCrawler.Start. The main reason for this layout is that at some point we want to introduce crawler instrumentation, i.e. that you have a piece of software that is able to crawl multiple botnets (and therefore contains multiple crawler instances) which is controlled by a central management server.

func (*BaseCrawler) IsBlacklisted

func (bc *BaseCrawler) IsBlacklisted(ipPort string) bool

IsBlacklisted returns whether the crawler considers a bot blacklisted and therefore won't crawl it.

The bot should be provided as ip:port (so that net.SplitHostPort). If the given bot can't be parsed it will be considered as blacklisted (to make sure the crawler won't crawl any bogon IPs).

By default the blacklist contains all special-use IP addresses. This can be changed by providing the WithIncludeSpecialUseIPs option. Additional IP ranges can be blacklisted by providing the WithCustomBlacklist option.

func (*BaseCrawler) Logger

func (bc *BaseCrawler) Logger() *slog.Logger

Logger returns the internal logger of the basecrawler.

You may want to pass this logger to your crawler implementation manually, so that you can use the same logger (which is mostly needed if you let CreateOptionsFromEnv create the logger for you).

func (*BaseCrawler) Start

func (bc *BaseCrawler) Start(ctx context.Context) error

Start starts the crawler instance. Depending on the crawler configuration, it will spawn several Go routines:

  • A Go routine that manages the discovery loop which itself starts more worker Go routines.
  • A Go routine that manages the tracking loop which itself starts more worker Go routines.
  • A Go routine that manages the find-peer loop which itself starts more worker Go routines (if the crawler implementation implements the PeerFinder interface).
  • A Go routine that sends crawled bot replies, edges and failed tries to BMS (if configured to use BMS).
  • A Go routine that tries to reconnect to BMS if the connection broke (if configured to use BMS).

You can stop these Go routines by passing a cancelable context (like in the CancelContext example).

This method is meant for instrumentation of multiple crawler instances (though we never got around to use it). The basic idea is to have multiple crawler instances that can be exchanged (stop one crawler, start another one) on the fly by a central management instance.

Example (CancelContext)
package main

import (
	"context"
	"time"

	"github.com/botnet-monitoring/basecrawler"
)

func main() {
	// Providing this to NewCrawler actually will result in an error since it neither implements TCPBotnetCrawler nor UDPBotnetCrawler
	someProperImplementation := struct{}{}

	crawler, err := basecrawler.NewCrawler(
		someProperImplementation,
		[]string{
			"192.0.2.1:20001",
			"192.0.2.2:20002",
			"192.0.2.3:20003",
		},
	)
	if err != nil {
		panic(err)
	}

	ctx, cancel := context.WithCancel(context.Background())

	crawler.Start(ctx)
	time.Sleep(5 * time.Second)
	cancel()

	// Do other stuff
}
Output:

func (*BaseCrawler) Stop

Stop currently just ends the internal BMS session with the provided disconnect reason (if there's an active BMS session). If you want to stop the crawl loops, pass a cancelable context to BaseCrawler.Start and cancel it.

This method is meant for instrumentation of multiple crawler instances (though we never got around to use it). Please also note that the crawler very likely won't start again once stopped (which we would need to fix before being able to do proper instrumentation).

type CrawlResult

type CrawlResult int

CrawlResult represents the result of a crawl attempt of a bot

const (
	// Crawled host responded but response classified as benign
	BENIGN_REPLY CrawlResult = iota

	// Crawled host responded and response is definitely from a malicious bot
	BOT_REPLY CrawlResult = iota

	// Crawled host did not respond
	NO_REPLY CrawlResult = iota
)

type CrawlerOption

type CrawlerOption func(*config)

Functions which implement CrawlerOption can be passed to NewCrawler as additional options.

func CreateOptionsFromEnv

func CreateOptionsFromEnv() ([]CrawlerOption, error)

CreateOptionsFromEnv creates a collection of crawler options based on environment variables. It can be used by crawler implementations to configure the basecrawler and is part of this module as every crawler implementation would need to re-implement it otherwise.

See the configuration section in the readme for available environment variables.

If LOG_LEVEL is provided, this function will create a logger. Please note that this might overwrite the logger provided by the crawler implementation (as options are applied in order). To get the created logger you can use BaseCrawler.Logger.

func MustCreateOptionsFromEnv

func MustCreateOptionsFromEnv() []CrawlerOption

MustCreateOptionsFromEnv is like CreateOptionsFromEnv but will panic if it encounters an error.

func WithAdditionalCrawlerConfig

func WithAdditionalCrawlerConfig(crawlerConfig map[string]any) CrawlerOption

WithAdditionalCrawlerConfig can be used to provide custom crawler configuration to the crawler implementation (e.g. SendPeerRequest or ReadReply).

The given config map will be passed as-is to all functions contained in TCPBotnetCrawler, UDPBotnetCrawler and PeerFinder.

func WithBMS

func WithBMS(server string, monitor string, authToken string, botnet string, options ...BMSOption) CrawlerOption

WithBMS can be used to configure the crawler to send the crawling results to a BMS server.

The server has to be passed as ip:port (e.g. localhost:8083), the authToken as base64-encoded string (which has contain exactly 32 bytes). The given monitor ID and botnet ID have to exist on the BMS server, otherwise trying to start the crawler will return an error.

You can pass optional BMS config as last parameter via BMSOption.

Please note that the option cannot check validity of the used values, so creating the BMS client or server might fail when trying to start the crawler.

func WithCustomBlacklist

func WithCustomBlacklist(blacklist []string) CrawlerOption

WithCustomBlacklist can be used to add IP address ranges to the crawler's blacklist (see BaseCrawler.IsBlacklisted).

The given strings have to be in CIDR notation. You might want to exclude your own crawlers from the crawling, so e.g. if you have two crawlers running on 198.51.100.1 and 198.51.100.2, you probably want to pass a string slice with 198.51.100.1/32 and 198.51.100.2/32.

func WithCustomCrawlIntervals

func WithCustomCrawlIntervals(discoveryInterval uint32, trackingInterval uint32, discoveryRemoveInterval uint32, trackingRemoveInterval uint32) CrawlerOption

WithCustomCrawlIntervals can be used to change how often the crawler will crawl potential bots and after how long of being unresponsive it will drop them.

All intervals are considered to be seconds. If you only want to change on of the intervals, pass zero for the other parameters. Defaults are 300s for discovery interval and tracking interval, and 900s for discovery remove interval and tracking remove interval.

func WithCustomWorkerCounts

func WithCustomWorkerCounts(discoveryWorkerCount uint32, trackingWorkerCount uint32, findPeerWorkerCount uint32) CrawlerOption

WithCustomWorkerCounts can be used to change the amount of workers the various loops will spawn.

If you only want to change the amount of one of the loops, pass zero for the other parameters. By default the discovery loop and tracking loop will spawn 10000 worker Go routines and the find-peer loop will spawn 100 worker Go routines (if it's used).

func WithIncludeSpecialUseIPs

func WithIncludeSpecialUseIPs(includeSpecialUseIPs bool) CrawlerOption

WithIncludeSpecialUseIPs can be used to include special-use IP addresses into the crawling (see BaseCrawler.IsBlacklisted).

You probably want to leave this setting on its default (so that the blacklist contains the special-use IP addresses), however e.g. for local testing you might want to include them.

func WithLogger

func WithLogger(logger *slog.Logger) CrawlerOption

WithLogger can be used to pass a custom logger to the basecrawler. By default, the basecrawler doesn't log anything, so if you want to have any logs, you have to pass this option.

If you also use CreateOptionsFromEnv, it might create its own logger (depending on the value of LOG_LEVEL) which might overwrite another provided logger (options are applied in the order they are passed). If you want to pass the created logger to your crawler implementation, use BaseCrawler.Logger.

type PeerFinder

type PeerFinder interface {
	SendFindPeersMsg(ctx context.Context, config map[string]any, conn *net.UDPConn, addr *net.UDPAddr)
	ReadFindPeersReply(ctx context.Context, config map[string]any, msg []byte, addr *net.UDPAddr) (CrawlResult, []string)
}

type TCPBotnetCrawler

type TCPBotnetCrawler interface {
	ReadReply(ctx context.Context, config map[string]any, msg []byte, addr *net.TCPAddr) (CrawlResult, []string, []bmsclient.Edge, []bmsclient.BotReply)
	SendPeerRequest(ctx context.Context, config map[string]any, conn *net.TCPConn, addr *net.TCPAddr)
}

type UDPBotnetCrawler

type UDPBotnetCrawler interface {
	ReadReply(ctx context.Context, config map[string]any, msg []byte, addr *net.UDPAddr) (CrawlResult, []string, []bmsclient.Edge, []bmsclient.BotReply)
	SendPeerRequest(ctx context.Context, config map[string]any, conn *net.UDPConn, addr *net.UDPAddr)
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL