mempool-archiver

module
v0.0.0-...-e3ee2d7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 11, 2023 License: MIT

README

Mempool Dumpster 🗑️♻️

Goreport status Test status

Dump mempool transactions from EL nodes, and archive them in Parquet and CSV format.

  • Parquet: Transaction metadata (timestamp in millis, hash, attributes; about 150MB / day)
  • CSV: Raw transactions (RLP hex + timestamp in millis + tx hash; about 1.2GB / day zipped)
  • This is work in progress and under heavy development (mempool collector is relatively stable now though!)
  • Observing about 2-4M mempool transactions per day

Getting started

Mempool Collector

  1. Connects to one or more EL nodes via websocket
  2. Listens for new pending transactions
  3. Writes timestamp + hash + rawTx to CSV file (one file per hour by default)

Default filename:

  • Schema: <out_dir>/<date>/transactions/txs-<datetime>.csv
  • Example: out/2023-08-07/transactions/txs-2023-08-07-10-00.csv

Running the mempool collector:

# Connect to ws://localhost:8546 and write CSVs into ./out
go run cmd/collector/main.go -out ./out

# Connect to multiple nodes
go run cmd/collector/main.go -out ./out -nodes ws://server1.com:8546,ws://server2.com:8546

Summarizer

WIP

  • Iterates over collector output directory
  • Creates summary file in Parquet format with key transaction attributes
  • TODO: create archive from output of multiple collectors
go run cmd/summarizer/main.go -h

Architecture

General design goals

  • Keep it simple and stupid
  • Vendor-agnostic (main flow should work on any server, independent of a cloud provider)
  • Downtime-resilience to minimize any gaps in the archive
  • Multiple collector instances can run concurrently, without getting into each others way
  • Summarizer script produces the final archive (based on the input of multiple collector outputs)
  • The final archive:
    • Includes (1) parquet file with transaction metadata, and (2) compressed file of raw transaction CSV files
    • Compatible with Clickhouse and S3 Select (Parquet using gzip compression)
    • Easily distributable as torrent

Mempool Collector

  • NodeConnection
    • One for each EL connection
    • New pending transactions are sent to TxProcessor via a channel
  • TxProcessor
    • Check if it already processed that tx
    • Store it in the output directory

Summarizer


Contributing

Install dependencies

go install mvdan.cc/gofumpt@latest
go install honnef.co/go/tools/cmd/staticcheck@latest
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
go install github.com/daixiang0/gci@latest

Lint, test, format

make lint
make test
make fmt

TODO

Lots, this is WIP

should:

  • collector support multiple -node cli args (like mev-boost)

could:

  • stats about which node saw how many tx first
  • http server to add/remove nodes, see stats, pprof?

Further notes


License

MIT


Maintainers

Directories

Path Synopsis
cmd
Package collector contains the mempool collector service
Package collector contains the mempool collector service
scripts
Package summarizer contains stuff for the summarizer script
Package summarizer contains stuff for the summarizer script

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL