supacrawl

module

v0.1.0 Latest Latest Go to latest Published: May 10, 2026 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/davemorin/supacrawl

Links

Open Source Insights

README ¶

supacrawl

supacrawl mirrors Supabase/Postgres project metadata into local SQLite so you can search, inspect, and query your backend without poking around the Supabase dashboard.

It follows the local-first shape of tools like discrawl and slacrawl: crawl once, keep the archive on your machine, use fast local search, and run read-only SQL against the snapshot.

supacrawl is part of the OpenClaw crawler family and is intended for teams, maintainers, and agents that need fast local access to Supabase project shape, policies, functions, Storage inventory, and copied table rows.

Included

local SQLite archive with FTS5 search
Supabase/Postgres schema crawl over a normal Postgres connection URL
schemas, tables, columns, indexes, constraints, RLS policies, functions, triggers, extensions
optional full table row mirror into SQLite as schema-agnostic JSON rows
optional row-copy mode without FTS for smaller archives
local archive size reports
automatic metadata refresh for read commands, with opt-out flags
per-table JSONL/CSV export from the local copy
Storage blob downloads from copied storage.objects rows
encrypted local backup shards with age
Supabase Storage buckets and object counts when the storage schema is visible
doctor diagnostics for local archive and database connectivity
metadata --json, status --json, doctor --json, and read-only sql
JSON output for automation with --json or --format json

Not Yet Included

Supabase Management API crawl for edge functions, auth settings, branches, or secrets
git-backed archive publish/subscribe
terminal UI
Homebrew/release packaging
git-backed snapshot sharing through crawlkit
terminal archive UI through crawlkit

Requirements

Go 1.26+
a Supabase direct or pooler Postgres connection URL
enough Postgres privileges to read catalog metadata

Install

Homebrew:

brew install davemorin/tap/supacrawl

Build from source:

git clone https://github.com/davemorin/supacrawl.git
cd supacrawl
go build -o bin/supacrawl ./cmd/supacrawl
./bin/supacrawl version

Quick Start

export SUPABASE_DB_URL="postgres://postgres.<ref>:<password>@aws-0-us-west-1.pooler.supabase.com:6543/postgres?sslmode=require"

go run ./cmd/supacrawl init --project-id <ref>
go run ./cmd/supacrawl doctor
go run ./cmd/supacrawl metadata --json
go run ./cmd/supacrawl sync
go run ./cmd/supacrawl sync --full
go run ./cmd/supacrawl sync --full --no-row-fts
go run ./cmd/supacrawl status
go run ./cmd/supacrawl status --sync never
go run ./cmd/supacrawl size
go run ./cmd/supacrawl search "auth policies"
go run ./cmd/supacrawl export --type jsonl --out companies.jsonl public.companies
go run ./cmd/supacrawl storage pull --dir ./supabase-storage --limit 10
go run ./cmd/supacrawl backup keygen --out ~/.supacrawl/age.key
go run ./cmd/supacrawl backup init --recipient age1...
go run ./cmd/supacrawl backup push
go run ./cmd/supacrawl backup status
go run ./cmd/supacrawl sql "select schema_name, name, rls_enabled from tables order by schema_name, name limit 20"
go run ./cmd/supacrawl sql "select row_json from table_rows where schema_name = 'public' and table_name = 'companies' limit 5"

If you build the binary, replace go run ./cmd/supacrawl with ./bin/supacrawl.

Commands

init creates ~/.supacrawl/config.toml
doctor checks the local SQLite archive and Supabase/Postgres connection
metadata prints command/capability metadata for launchers and agents
version prints the CLI version
sync crawls metadata into the local archive
sync --data or sync --full also copies base table rows into table_rows
status prints archive counts
report summarizes schemas and policy coverage
size reports archive file size and largest copied source tables
search searches crawled tables, functions, storage buckets, and extensions
export writes copied source rows for one table as JSONL or CSV
storage pull downloads Storage blobs into a local directory
backup keygen creates an age identity and prints the public recipient
backup init stores local backup settings in config
backup push writes encrypted JSONL gzip shards plus a manifest
backup status prints backup manifest metadata
backup pull decrypts backup shards into a local restore directory
sql runs read-only SQL against the local SQLite archive

Configuration

Default config path:

~/.supacrawl/config.toml

Default archive path:

~/.supacrawl/supacrawl.db

The default secret handling expects the connection string in SUPABASE_DB_URL. You can change that during init:

supacrawl init --database-url-env MY_SUPABASE_DB_URL

--database-url is available for local experiments, but the env-var flow is the safer default because the config file is durable.

Storage downloads also need:

export SUPABASE_URL="https://<ref>.supabase.co"
export SUPABASE_SERVICE_ROLE_KEY="..."

If SUPABASE_URL is not set, supacrawl will also try NEXT_PUBLIC_SUPABASE_URL.

Read Freshness

Read commands refresh metadata automatically when the local snapshot is stale:

supacrawl status
supacrawl search "profiles"
supacrawl sql "select count(*) from tables"

The default policy is auto with a 15m staleness window. If the database URL is not available, auto continues with the local archive. Use explicit flags when you need a deterministic read:

supacrawl status --sync never
supacrawl report --sync always
supacrawl search --stale-after 1h "auth.uid"

Full Local Copy Mode

sync --full mirrors accessible base-table rows into SQLite:

supacrawl sync --full

Rows are stored as JSON in table_rows:

select
  schema_name,
  table_name,
  json_extract(row_json, '$.id') as id
from table_rows
where schema_name = 'public'
limit 20

This makes the first full-copy mode robust across arbitrary Supabase schemas. It is not yet a byte-for-byte Supabase backup: Edge Function source and project dashboard settings are still future work.

Use --no-row-fts when archive size matters more than full-text search over row JSON:

supacrawl sync --full --no-row-fts

Export

supacrawl export --type jsonl --out companies.jsonl public.companies
supacrawl export --type csv --out companies.csv public.companies

Storage Blobs

After a full sync has copied storage.objects, download blobs:

supacrawl storage pull --dir ./supabase-storage

The downloader preserves bucket/object paths under the target directory.

Encrypted Backups

Generate a local age identity once:

supacrawl backup keygen --out ~/.supacrawl/age.key

Store the public recipient in config or in SUPACRAWL_AGE_RECIPIENT:

supacrawl backup init --recipient age1...

Create an encrypted backup from the SQLite archive:

supacrawl backup push
supacrawl backup status

Restore decrypted JSONL gzip shards to a directory:

supacrawl backup pull --out ./supacrawl-restore

backup push is local-only today. It writes encrypted shards under ~/.supacrawl/backups by default and does not push to a remote.

OpenClaw Compatibility

supacrawl is designed to fit the OpenClaw local-first crawler family:

doctor is the fastest live sanity check.
metadata --json, status --json, and doctor --json are stable surfaces for launchers, agents, and CI.
sync is metadata-first; sync --full is explicit for copying rows.
sync --full prints per-table progress to stderr so JSON stdout stays parseable.
read commands default to a bounded auto-refresh policy and accept --sync never.
backups are encrypted local shards; private identities are read from env or disk.
Secrets are resolved from environment variables and are not written into the archive by default.
Analysis happens against local SQLite through read-only sql.

See docs/openclaw.md for agent rules and convention notes.

Output Modes

supacrawl status --json
supacrawl --format json search "profiles"
supacrawl --format log sync

Local Development

go test ./...
go vet ./...
go build -o bin/supacrawl ./cmd/supacrawl
./scripts/validate-local.sh

Directories ¶

Path	Synopsis
cmd
supacrawl command
internal
backup
cli
config
postgres
search
storage
store

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL