iprep
Design for an "IP reputation" database, meant to detect harmful actors
(including distributed ones).
First an architectural observation: we most likely want to collect
data as events originating from multiple sources, which pretty
much dictates some sort of RPC submission scenario with probes
separate from the central event database. Furthermore we recognize
that most operations will amount to per-IP aggregations over the
(windowed) event logs themselves.
The system must also perform well under overloading (DoS) scenarios,
which implies at the very least that traffic from probes should not
scale linearly with the number of events: there must be a level of
time-based aggregation going on at the client side of the data
submission protocol. This can be achieved by implementing a "minimum
processing delay" and a scheduled report thread (or, more simply, via
cron).
Database
The event database is a time-based append-only log: the two operations
supported are append and scan (and internally a
delete-older-than that periodically wipes entries that are too old
to be relevant anymore).
Querying reputation for an IP consists in scanning the database for a
pre-defined window of time in the past, and passing the results to a
scoring script (currently written in an embedded
language), that applies aggregation and
weighting and returns the final score.
RPC interface
The server provides a simple GRPC interface that is used for event
submission and querying. The query API is a simple IP lookup,
returning a score. This conceivably could be turned into a DNS-based
API as well.
External sources
The scoring script can consult other IP-based third-party sources, such
as DNSBLs, or GeoIP lookups, etc.
These are configured via YAML snippets in a directory, each file
corresponding to a separate external source.
Each source configuration should specify the following parameters:
- name, the name of the source
- type, one of the supported source types (either dnsbl or geoip
at the moment)
- params, a dictionary of further type-specific configuration
parameters.
The parameters for dnsbl source type are:
- domain, the DNSBL domain to query
- match, a regexp pattern for the DNS query result. The default is
".*", i.e. any result will be regarded as a positive match.
The parameters for geoip source type are:
- paths, a list of paths to MaxMind GeoIP databases. The default
list contains a single path, /var/lib/GeoIP/GeoLite2-Country.mmdb
External sources can be used in the scoring script by calling the
ext() function, with the source name and IP address as parameters.