README
¶
fproof
Notarize documents on the Ethereum blockchain.
fproof is a CLI tool that notarizes S3 objects on the Ethereum blockchain. For each object it stores a digital fingerprint, which can't be modified anymore and serves as proof of the original document. If you need to prove that documents haven't been modified since their original storage — fproof gives you that guarantee backed by the immutability and transparency of blockchains.
Storage on blockchain is very expensive and transaction fees are volatile. You can't store the documents on blockchain directly. Instead, fproof uses blockchain as a notary service, storing proofs of documents. Merkle trees enable compressing all digital fingerprints into a single root hash, so the number of input documents doesn't affect the number of transactions. The root hash is sent as a transaction to blockchain, while the Merkle proof of each individual hash is stored as Amazon S3 metadata with the document. That way, the proof can never be separated from the document itself — you can retrieve it by querying the object's metadata.
Verification can then be done by recomputing the branch of the Merkle tree. If the original document is still the same as during Merkle tree creation, the verification step results in the same root hash. With the root hash retrieved from the blockchain, fproof proves two things: the original document was part of the Merkle tree at its original creation, and the document existed when the root hash was stored on blockchain.
Based on the approach described in Notarize documents on the Ethereum Blockchain.
Installation
Requires Go 1.25+.
go install github.com/eerzho/fproof@latest
Configuration
Create a .fproof.yaml file or pass flags directly. Every config field can be overridden with a CLI flag.
s3:
endpoint: https://s3.amazonaws.com
access-key: YOUR_ACCESS_KEY
secret-key: YOUR_SECRET_KEY
bucket: your-bucket
prefix: documents/
region: us-east-1
use-path-style: false
eth:
rpc-url: https://mainnet.infura.io/v3/YOUR_PROJECT_ID
private-key: YOUR_HEX_PRIVATE_KEY
chain-id: 1
| Flag | Description | Default |
|---|---|---|
--config |
Path to YAML config file | .fproof.yaml |
--s3-endpoint |
S3-compatible endpoint URL | - |
--s3-access-key |
S3 access key ID | - |
--s3-secret-key |
S3 secret access key | - |
--s3-bucket |
Target S3 bucket name | - |
--s3-prefix |
S3 object key prefix | - |
--s3-region |
S3 region | us-east-1 |
--s3-use-path-style |
Use path-style S3 addressing | false |
--eth-rpc-url |
Ethereum JSON-RPC endpoint | - |
--eth-private-key |
Hex-encoded private key for signing | - |
--eth-chain-id |
Ethereum chain ID | - |
Commands
fproof commit
Hash S3 objects and anchor Merkle roots to Ethereum.
fproof commit --config .fproof_example.yaml --prefix 100mb
The commit pipeline takes all objects from Amazon S3 and hashes them, aggregates the individual hashes into a Merkle tree, sends the root hash as a transaction to blockchain, and stores the Merkle proof of each individual hash as Amazon S3 metadata with the document:
- Gets a list of all objects in Amazon S3 with a specific prefix, grouped into chunks (
--chunk-size, default 1000) - For each object, retrieves it from Amazon S3 and generates its SHA-256 hash in parallel (
--concurrency, default 5) - Creates the Merkle tree as a pairwise hash tree — the hashes form the tree's leaves, then builds the tree bottom-up until one root hash remains
- Sends the root hash as a transaction to blockchain (zero-value self-transaction with root as calldata)
- Stores the Merkle proof of each individual hash as Amazon S3 metadata with the document (
fproof-tx-id,fproof-root,fproof-path,fproof-siblings)
| Flag | Description | Default |
|---|---|---|
-p, --prefix |
S3 key prefix filter | - |
-c, --concurrency |
Max parallel S3 operations | 5 |
-s, --chunk-size |
Objects per Merkle tree chunk | 1000 |
fproof verify
Verify an S3 object's proof against the blockchain.
fproof verify --config .fproof_example.yaml --key 100mb/file_001.bin
Verification is fairly simple computation. It only requires a sequence of hash operations, which is bound by the height of the tree:
- Retrieves the object from Amazon S3 and generates its SHA-256 hash
- Retrieves the proof — the Merkle proof stored as Amazon S3 metadata with the document (
fproof-tx-id,fproof-root,fproof-path,fproof-siblings) - Recomputes the branch of the Merkle tree by doing the pairwise hashing from the leaf to the root using the proof hashes
- Retrieves the root hash from the blockchain transaction and checks if the calculated root matches the one retrieved from the blockchain
If the calculated root matches the one retrieved from the blockchain, it proves that the original document was part of the Merkle tree at its original creation and that the document existed when the root hash was stored on blockchain. If any step fails — the document was modified, metadata was tampered with, or the on-chain root doesn't match — verification fails.
| Flag | Description | Default |
|---|---|---|
-k, --key |
S3 object key to verify | - |
How Merkle Trees Work
Merkle trees are very useful to prove that a particular data point is part of a data structure. fproof stores the proofs in a Merkle tree data structure, which aggregates many hashes (the leaves of the tree) into one so-called root hash. The tree has all the proofs for the documents as its leaves. Bottom up, we hash the proofs pairwise until we end up with one hash only, which forms the root of the tree.
Given n objects with hashes h₁, h₂, …, hₙ, the tree is constructed bottom-up:
root = H(h₁₂ ‖ h₃₄)
/ \
h₁₂ = H(h₁ ‖ h₂) h₃₄ = H(h₃ ‖ h₄)
/ \ / \
H(h₁) H(h₂) H(h₃) H(h₄)
Where H is SHA-256 and ‖ denotes concatenation.
We can verify the existence of a specific document. We need two additional data points: first, the so-called proof (sibling hashes) for an element — the hashes to do the pairwise hashing without recreating the entire tree each time. Second, the actual root hash, which we can retrieve from the blockchain.
Formally: given a leaf hash hᵢ, a path direction vector p ∈ {0, 1}^d (where d = ⌈log₂(n)⌉), and sibling hashes s₁, s₂, …, s_d:
v₀ = hᵢ
vⱼ = H(vⱼ₋₁ ‖ sⱼ) if pⱼ = 0
vⱼ = H(sⱼ ‖ vⱼ₋₁) if pⱼ = 1
The proof is valid iff v_d = R (the root hash from blockchain).
With this approach:
- Compression: with
xelements in the tree, we only needlog(x)hash operations for verification. With 1,000,000 objects, that's ~20 hashes instead of 1,000,000. The number of input documents doesn't affect the number of transactions. - Tamper detection: changing even a single bit in any object produces a completely different root hash. If the original document is still the same as during Merkle tree creation, the verification step results in the same root hash. Forging a valid proof would require finding a SHA-256 collision (~2¹²⁸ operations), which is computationally infeasible.
- Independent verification: anyone with the object, proof metadata, and access to the blockchain can independently verify integrity — no trusted third party required.
Why Store on Blockchain
Although blockchain can store data immutably, it's very restricted on the amount of data. Each byte stored on blockchain is fairly expensive. The high transaction fees and volatility in those fees lead to two insights: we can't store the documents on blockchain directly, and even storing proofs only for every document individually is too expensive. Instead, we have to compress many proofs into one transaction so that we can reduce the number of transactions drastically.
fproof solves this by storing only the root hash of each Merkle tree on blockchain. The root hash is sent as the calldata of a zero-value self-transaction (sending 0 ETH to your own address). This costs close to the minimum possible gas on Ethereum. Because fproof batches objects into Merkle trees (up to 1,000 per chunk by default), the per-object cost is negligible. Transaction cost remains manageable, because it depends on the chunk size only and not on the number of documents.
Due to the immutability and transparency of blockchains, they can be a useful tool for notarizing documents:
- Immutability: blockchains can store values immutably so that they can be audited at a later point. Once a transaction is included in a block, the root hash stored in the transaction's calldata cannot be altered or deleted. Ethereum's consensus mechanism with thousands of independent validators makes retroactive changes practically impossible.
- Timestamping: each block has a timestamp agreed upon by the network's validators. When fproof stores a root hash on blockchain, this serves as proof that the document existed when the root hash was stored on blockchain — a stronger guarantee than a traditional trusted third party because it doesn't rely on a single entity's honesty.
- Public verifiability: anyone can independently query any transaction and read the calldata. Verification requires no special permissions or trust relationships. You can verify a proof using any Ethereum node, any block explorer, or fproof itself.
- No smart contract required: fproof uses simple self-transactions rather than deploying a smart contract. This minimizes gas costs and eliminates smart contract risk. The data is stored in the transaction itself — permanently available through any Ethereum node.
Differences from the AWS Approach
The AWS approach described in Notarize documents on the Ethereum Blockchain deploys a smart contract with storeNewRootHash and verify functions. fproof takes a different path that is simpler, cheaper, and has a smaller attack surface.
No smart contract. The AWS solution stores the root hash by calling a smart contract function that emits an event. A smart contract is code on blockchain — it can contain bugs, it can be deployed as upgradeable (allowing the owner to change verification logic after the fact), and it introduces attack surface (reentrancy, access control errors, etc.). fproof stores the root hash directly in the transaction's calldata as raw bytes. There is no code on blockchain, nothing to exploit, nothing that can be upgraded. The data in a transaction is immutable by definition — not by the correctness of a contract, but by the protocol itself.
Off-chain verification. The AWS smart contract has an on-chain verify function that recomputes the Merkle branch inside the EVM. This is convenient (anyone can call it without installing software), but every verification costs gas and relies on the contract code being correct. fproof verifies entirely off-chain: it reads tx.Data from any Ethereum node, recomputes the branch locally, and compares the root. Verification is free, can be done offline once the transaction data is fetched, and doesn't depend on any deployed code.
Lower gas costs. A smart contract deployment costs hundreds of thousands of gas. Each storeNewRootHash call costs more than a simple transaction because of contract execution overhead and event emission. fproof uses a simple self-transaction with the root hash as calldata, which costs close to the minimum possible on Ethereum.
Local Development
Start MinIO and Anvil via Docker Compose:
task up
Build the binary:
task build
Run locally:
./fproof commit --config .fproof_example.yaml --prefix 100mb
./fproof verify --config .fproof_example.yaml --key 100mb/file_001.bin
Documentation
¶
There is no documentation for this package.