Mempool Dumpster 🗑️♻️

Dump mempool transactions from EL nodes, and archive them in Parquet and CSV format.
- Parquet: Transaction metadata (timestamp in millis, hash, attributes; about 150MB / day)
- CSV: Raw transactions (RLP hex + timestamp in millis + tx hash; about 1.2GB / day zipped)
- This is work in progress and under heavy development (mempool collector is relatively stable now though!)
- Observing about 2-4M mempool transactions per day
Getting started
Mempool Collector
- Connects to one or more EL nodes via websocket
- Listens for new pending transactions
- Writes
timestamp
+ hash
+ rawTx
to CSV file (one file per hour by default)
Default filename:
- Schema:
<out_dir>/<date>/transactions/txs-<datetime>.csv
- Example:
out/2023-08-07/transactions/txs-2023-08-07-10-00.csv
Running the mempool collector:
# Connect to ws://localhost:8546 and write CSVs into ./out
go run cmd/collector/main.go -out ./out
# Connect to multiple nodes
go run cmd/collector/main.go -out ./out -nodes ws://server1.com:8546,ws://server2.com:8546
Summarizer
WIP
- Iterates over collector output directory
- Creates summary file in Parquet format with key transaction attributes
- TODO: create archive from output of multiple collectors
go run cmd/summarizer/main.go -h
Architecture
General design goals
- Keep it simple and stupid
- Vendor-agnostic (main flow should work on any server, independent of a cloud provider)
- Downtime-resilience to minimize any gaps in the archive
- Multiple collector instances can run concurrently, without getting into each others way
- Summarizer script produces the final archive (based on the input of multiple collector outputs)
- The final archive:
- Includes (1) parquet file with transaction metadata, and (2) compressed file of raw transaction CSV files
- Compatible with Clickhouse and S3 Select (Parquet using gzip compression)
- Easily distributable as torrent
Mempool Collector
NodeConnection
- One for each EL connection
- New pending transactions are sent to
TxProcessor
via a channel
TxProcessor
- Check if it already processed that tx
- Store it in the output directory
Summarizer
Contributing
Install dependencies
go install mvdan.cc/gofumpt@latest
go install honnef.co/go/tools/cmd/staticcheck@latest
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
go install github.com/daixiang0/gci@latest
Lint, test, format
make lint
make test
make fmt
TODO
Lots, this is WIP
should:
- collector support multiple
-node
cli args (like mev-boost)
could:
- stats about which node saw how many tx first
- http server to add/remove nodes, see stats, pprof?
Further notes
License
MIT
Maintainers