mempool-archiver

module

v0.0.0-...-e3ee2d7 Latest Latest Go to latest Published: Aug 11, 2023 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/flashbots/mempool-archiver

README ¶

Mempool Dumpster 🗑️♻️

Dump mempool transactions from EL nodes, and archive them in Parquet and CSV format.

Parquet: Transaction metadata (timestamp in millis, hash, attributes; about 150MB / day)
CSV: Raw transactions (RLP hex + timestamp in millis + tx hash; about 1.2GB / day zipped)
This is work in progress and under heavy development (mempool collector is relatively stable now though!)
Observing about 2-4M mempool transactions per day

Getting started

Mempool Collector

Connects to one or more EL nodes via websocket
Listens for new pending transactions
Writes timestamp + hash + rawTx to CSV file (one file per hour by default)

Default filename:

Schema: <out_dir>/<date>/transactions/txs-<datetime>.csv
Example: out/2023-08-07/transactions/txs-2023-08-07-10-00.csv

Running the mempool collector:

# Connect to ws://localhost:8546 and write CSVs into ./out
go run cmd/collector/main.go -out ./out

# Connect to multiple nodes
go run cmd/collector/main.go -out ./out -nodes ws://server1.com:8546,ws://server2.com:8546

Summarizer

WIP

Iterates over collector output directory
Creates summary file in Parquet format with key transaction attributes
TODO: create archive from output of multiple collectors

go run cmd/summarizer/main.go -h

Architecture

General design goals

Keep it simple and stupid
Vendor-agnostic (main flow should work on any server, independent of a cloud provider)
Downtime-resilience to minimize any gaps in the archive
Multiple collector instances can run concurrently, without getting into each others way
Summarizer script produces the final archive (based on the input of multiple collector outputs)
The final archive:
- Includes (1) parquet file with transaction metadata, and (2) compressed file of raw transaction CSV files
- Compatible with Clickhouse and S3 Select (Parquet using gzip compression)
- Easily distributable as torrent

Mempool Collector

NodeConnection
- One for each EL connection
- New pending transactions are sent to TxProcessor via a channel
TxProcessor
- Check if it already processed that tx
- Store it in the output directory

Summarizer

Uses https://github.com/xitongsys/parquet-go to write Parquet format

Contributing

Install dependencies

go install mvdan.cc/gofumpt@latest
go install honnef.co/go/tools/cmd/staticcheck@latest
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
go install github.com/daixiang0/gci@latest

Lint, test, format

make lint
make test
make fmt

TODO

Lots, this is WIP

should:

collector support multiple -node cli args (like mev-boost)

could:

stats about which node saw how many tx first
http server to add/remove nodes, see stats, pprof?

Further notes

See also: discussion about compression and storage

License

MIT

Maintainers

Directories ¶

Path	Synopsis
cmd
collector
summarizer
collector Package collector contains the mempool collector service	Package collector contains the mempool collector service
scripts
get-tx
summarizer Package summarizer contains stuff for the summarizer script	Package summarizer contains stuff for the summarizer script

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL