crawler_v2

module
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 22, 2025 License: MIT

README

Crawler v2

This repo is a rewrite of the original and discontinued crawler, under active developement.

Goals

The goals of this project are:

  • Continuously crawl the Nostr network (24/7/365), searching for follow lists (kind:3) and other relevant events.

  • Quickly assess whether new events should be added to the database based on the author's rank. Approved events are used to build a custom Redis-backed graph database.

  • Generate and maintain random walks for nodes in the graph, updating them as the graph topology evolves.

  • Use these random walks to efficiently compute acyclic Monte Carlo Pageranks (personalized and global). Algorithms are inspired by this paper

Apps

/cmd/crawler/

The main entry point, which assumes that the event store and Redis are syncronized. In case they are empty, the graph will be initialized using the INIT_PUBKEYS specified in the enviroment.

/cmd/sync/

This mode builds the Redis graph database from the event store. In other words, it syncronizes Redis to reflect the events in the event store, starting from the INIT_PUBKEYS specified in the enviroment, and expanding outward.

Directories

Path Synopsis
cmd
crawler command
sync command
pkg
config
The config package loads and validates the variables in the enviroment into a Config
The config package loads and validates the variables in the enviroment into a Config
graph
The graph package defines the fundamental structures (e.g.
The graph package defines the fundamental structures (e.g.
pagerank
The pagerank package uses the random walks to compute graph algorithms like global and personalized pageranks.
The pagerank package uses the random walks to compute graph algorithms like global and personalized pageranks.
pipe
The pipe package defines high-level pipeline functions (e.g.
The pipe package defines high-level pipeline functions (e.g.
redb
The package redb defines the redis implementation of the database, which stores graph relationships (e.g.
The package redb defines the redis implementation of the database, which stores graph relationships (e.g.
walks
The walks package is responsible for defining, generating, removing and updating random walks.
The walks package is responsible for defining, generating, removing and updating random walks.
tests

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL