README
¶
dox
dox is a Go CLI that syncs library and framework docs from GitHub repos or URLs into a local directory for indexing tools and AI assistants.
Why
Keep documentation config in your repo, keep downloaded docs out of git, and reproduce docs locally on any machine with one command.
Installation
Homebrew (macOS/Linux)
brew install g5becks/tap/dox
Download Binary
Download pre-built binaries from the releases page.
Linux/macOS:
# Download and extract (adjust version and platform)
curl -L https://github.com/g5becks/dox/releases/latest/download/dox_Linux_x86_64.tar.gz | tar xz
sudo mv dox /usr/local/bin/
Windows:
Download the .zip from releases, extract, and add to PATH.
Linux Packages
Debian/Ubuntu:
wget https://github.com/g5becks/dox/releases/latest/download/dox_0.1.0_linux_amd64.deb
sudo dpkg -i dox_0.1.0_linux_amd64.deb
RedHat/Fedora/CentOS:
wget https://github.com/g5becks/dox/releases/latest/download/dox_0.1.0_linux_amd64.rpm
sudo rpm -i dox_0.1.0_linux_amd64.rpm
Alpine Linux:
wget https://github.com/g5becks/dox/releases/latest/download/dox_0.1.0_linux_amd64.apk
sudo apk add --allow-untrusted dox_0.1.0_linux_amd64.apk
Arch Linux:
wget https://github.com/g5becks/dox/releases/latest/download/dox_0.1.0_linux_amd64.pkg.tar.zst
sudo pacman -U dox_0.1.0_linux_amd64.pkg.tar.zst
Docker
docker pull ghcr.io/g5becks/dox:latest
docker run --rm ghcr.io/g5becks/dox:latest --version
Go Install
go install github.com/g5becks/dox/cmd/dox@latest
Build from Source
git clone https://github.com/g5becks/dox.git
cd dox
task build
./bin/dox --help
Quick Start
- Create a starter config:
dox init
This generates dox.toml with sensible defaults, including global exclude patterns for common files (images, node_modules, build artifacts, etc.).
-
Edit
dox.tomland add sources. -
Sync docs:
dox sync
- Ignore downloaded docs:
.dox/
Example Config
# dox.toml
output = ".dox"
# Optional: Set default parallelism for syncing (default: 4x CPU cores, min 10)
# Override per-command with: dox sync --parallel N
max_parallel = 20
# Global exclude patterns applied to all git sources
# Per-source excludes add to (not replace) these global patterns
excludes = [
"**/*.png",
"**/*.jpg",
"node_modules/**",
".github/**",
]
# GitHub source - type inferred from 'repo'
[sources.goreleaser]
repo = "goreleaser/goreleaser"
path = "www/docs"
patterns = ["**/*.md"] # Override default patterns
exclude = ["**/changelog.md"] # Adds to global excludes
# URL source - type inferred from 'url'
[sources.hono]
url = "https://hono.dev/llms-full.txt"
Commands
Sync
dox sync
dox sync goreleaser hono
dox sync --force
dox sync --clean
dox sync --dry-run
dox sync --parallel 5
All commands accept --config path/to/dox.toml (-c) to override config discovery.
List
dox list
dox list --verbose
dox list --json
dox list --files
Add
# Add GitHub source (type inferred from --repo)
dox add goreleaser --repo goreleaser/goreleaser --path www/docs
# Add URL source (type inferred from --url)
dox add hono --url https://hono.dev/llms-full.txt
# Add GitLab source (specify host)
dox add myproject --repo owner/repo --path docs --host gitlab.com
# Overwrite existing source
dox add goreleaser --repo goreleaser/goreleaser --path www/docs --force
Use --force to overwrite an existing source with the same name.
Clean
dox clean
dox clean goreleaser
Init
dox init
Query Documentation
Browse and read synced documentation directly from the CLI. After syncing, dox generates a manifest that enables fast querying of your documentation collections.
Quick Start
# Sync documentation
dox sync
# List all collections
dox collections
# List files in a collection
dox files goreleaser
# Read a file with line numbers
dox cat goreleaser docs/install.md
# Show document structure (headings/exports)
dox outline goreleaser docs/install.md
Collections
List all synced documentation collections:
dox collections # Table output
dox collections --json # JSON output
dox collections --limit 5 # Limit results
Files
List files in a collection with customizable output:
# Default table output
dox files goreleaser
# JSON output
dox files goreleaser --json
# CSV output
dox files goreleaser --format csv
# Custom fields
dox files goreleaser --fields path,type,size
# Limit results
dox files goreleaser --limit 10
# Show all files (no limit)
dox files goreleaser --all
# Shorter descriptions
dox files goreleaser --desc-length 100
Available fields: path, type, lines, size, description, modified
Cat
Read file contents with line numbers:
# Show entire file
dox cat goreleaser docs/install.md
# Start at line 10
dox cat goreleaser docs/install.md --offset 10
# Show 20 lines
dox cat goreleaser docs/install.md --limit 20
# Start at line 10, show 20 lines
dox cat goreleaser docs/install.md --offset 10 --limit 20
# Without line numbers
dox cat goreleaser docs/install.md --no-line-numbers
# JSON output with metadata
dox cat goreleaser docs/install.md --json
Outline
Show document structure (headings for markdown/MDX, exports for TypeScript):
dox outline goreleaser docs/install.md
dox outline goreleaser docs/install.md --json
Output shows line numbers alongside:
- Markdown/MDX: Heading hierarchy with levels and indentation
- TypeScript/TSX: Exported functions, classes, types
- Text files: First line description
Search
Search across documentation metadata or file contents to discover relevant docs without guessing paths.
Metadata Search (Default)
Fuzzy search across file paths, descriptions, headings, and exports:
# Search all collections
dox search "installation"
# Search specific collection
dox search "hooks" --collection react
# Limit results
dox search "config" --limit 10
# JSON output
dox search "api" --json
# CSV output
dox search "guide" --format csv
# Truncate descriptions
dox search "guide" --desc-length 80
Metadata search returns:
- Collection name
- File path
- File type
- Match field (path, description, heading, or export)
- Fuzzy match score
- File description
Content Search
Search within file contents using literal or regex patterns:
# Literal search (case-insensitive)
dox search "useState" --content
# Search specific collection
dox search "configuration" --content --collection goreleaser
# Regex search
dox search "func.*Logger" --content --regex
# Limit results
dox search "import" --content --limit 20
Content search returns:
- Collection name
- File path
- Line number (1-based)
- Matching line text
Search Behavior
- Metadata mode: Fuzzy matches across paths, descriptions, headings, and TypeScript exports
- Content mode: Searches actual file contents (skips binary files and files >50MB)
- Regex mode: Requires
--contentflag, uses case-insensitive patterns - Collection filter: Restricts search to a single collection
- Limit:
0means unlimited (default uses configdefault_limit)
AI Agent Integration
Query commands are designed for AI agents and automation:
# Agent workflow example
dox sync # Ensure docs are current
dox collections --json # Discover available docs
dox search "authentication" --json # Find relevant topics
dox files react --json --limit 100 # Browse collection files
dox cat react docs/hooks.md # Read specific content
Display Configuration
Customize query output in dox.toml:
[display]
default_limit = 50 # Default result limit for dox files and dox search
description_length = 200 # Max description/text length in table output
line_numbers = true # Show line numbers in dox cat
format = "table" # Default format: table, json, csv
list_fields = ["path", "type", "lines", "size", "description"]
Troubleshooting
"Manifest not found"
- Run
dox syncto generate the manifest - Manifest is created automatically after each sync
"Collection not found"
- Run
dox collectionsto see available collections - Ensure the source name matches your
dox.tomlconfiguration
Global Excludes
Use the top-level excludes key to define patterns that apply to all git sources (GitHub, GitLab, Codeberg, etc.):
excludes = [
"**/*.png",
"**/*.jpg",
"node_modules/**",
".vitepress/**",
]
Key behaviors:
- Global excludes apply to all git sources (GitHub, GitLab, Codeberg, self-hosted)
- Global excludes do NOT apply to URL sources
- Per-source
excludepatterns add to (not replace) global excludes - Duplicate patterns are automatically removed
dox initgenerates a config with comprehensive defaults
Example:
excludes = ["**/*.png", "node_modules/**"]
[sources.docs]
repo = "owner/repo"
path = "docs"
exclude = ["**/*.jpg", "**/*.png"] # Final excludes: *.png, *.jpg, node_modules/**
Configuration Reference
Minimal Configuration
dox infers source types automatically and applies sensible defaults to minimize config verbosity:
# Minimal GitHub source - type inferred from 'repo' presence
[sources.goreleaser]
repo = "goreleaser/goreleaser"
path = "www/docs"
# Minimal URL source - type inferred from 'url' presence
[sources.hono]
url = "https://hono.dev/llms-full.txt"
What's inferred:
type = "github"whenrepois presenttype = "url"whenurlis presenthost = "github.com"for git sources (GitHub default)patterns = ["**/*.md", "**/*.mdx", "**/*.txt"]for git sourcesref = <default branch>for git sources (fetched from remote)filename = <basename from URL>for URL sources
Git Hosting Sources
GitHub (default)
[sources.docs]
repo = "owner/repo"
path = "docs"
# type inferred as "github", host defaults to "github.com"
GitLab
[sources.docs]
repo = "owner/repo"
path = "docs"
host = "gitlab.com" # Explicit host triggers gitlab type inference
Or explicitly set type:
[sources.docs]
type = "gitlab"
repo = "owner/repo"
path = "docs"
# host defaults to "github.com" but gitlab client handles gitlab.com URLs
Codeberg
[sources.docs]
repo = "owner/repo"
path = "docs"
host = "codeberg.org"
Self-Hosted Git (GitHub Enterprise, GitLab CE/EE, Gitea, etc.)
[sources.internal-docs]
repo = "company/documentation"
path = "guides"
host = "git.company.com"
Advanced Git Source Options
[sources.advanced]
type = "github" # Optional: github|gitlab|codeberg|git (inferred if omitted)
repo = "owner/repo" # Required: owner/repo format
host = "github.com" # Optional: git host (default: github.com)
path = "docs/content" # Required: path to directory or file in repo
ref = "v2.0.0" # Optional: branch, tag, or commit SHA (default: repo default branch)
patterns = ["**/*.md"] # Optional: glob patterns (default: ["**/*.md", "**/*.mdx", "**/*.txt"])
exclude = ["**/draft/**"] # Optional: exclude patterns (merges with global excludes)
out = "custom-dir" # Optional: output subdirectory (default: source name)
URL Sources
For direct file downloads from any HTTP/HTTPS URL:
# Minimal
[sources.hono]
url = "https://hono.dev/llms-full.txt"
# With options
[sources.spec]
url = "https://example.com/api/spec.yaml"
filename = "openapi.yaml" # Optional: custom filename (default: spec.yaml from URL)
out = "specs" # Optional: output subdirectory (default: source name)
Complete Field Reference
Global Configuration
| Field | Type | Default | Description |
|---|---|---|---|
output |
string | .dox |
Output directory root (relative to config file or absolute) |
github_token |
string | $GITHUB_TOKEN or $GH_TOKEN |
GitHub API token for private repos and higher rate limits |
max_parallel |
int | 4 × CPU cores (min 10) |
Max concurrent source syncs (I/O-bound operations) |
excludes |
[]string | [] |
Global exclude patterns applied to all git sources |
Source Fields (Git Hosting)
| Field | Required | Default | Description |
|---|---|---|---|
type |
No | Inferred | Source type: github, gitlab, codeberg, git, url |
repo |
Yes* | — | Repository in owner/repo format |
host |
No | github.com |
Git hosting domain (e.g., gitlab.com, git.company.com) |
path |
Yes | — | Path to directory or file in repository |
ref |
No | Default branch | Branch, tag, or commit SHA to sync |
patterns |
No | ["**/*.md", "**/*.mdx", "**/*.txt"] |
Glob patterns for files to include |
exclude |
No | [] |
Glob patterns to exclude (merged with global excludes) |
out |
No | Source name | Custom output subdirectory name |
*Required for git sources. Must have either repo OR url, not both.
Source Fields (URL)
| Field | Required | Default | Description |
|---|---|---|---|
type |
No | url (inferred) |
Automatically set to url when url field is present |
url |
Yes* | — | Direct HTTP/HTTPS URL to a file |
filename |
No | Basename from URL | Custom filename for downloaded file |
out |
No | Source name | Custom output subdirectory name |
*Required for URL sources. Must have either repo OR url, not both.
Configuration Comparison: Before vs. After
Before (verbose, redundant):
[sources.goreleaser]
type = "github" # Redundant - obvious from 'repo'
repo = "goreleaser/goreleaser"
ref = "main" # Redundant - same as default branch
path = "www/docs"
patterns = ["**/*.md", "**/*.mdx", "**/*.txt"] # Redundant - same as default
[sources.hono]
type = "url" # Redundant - obvious from 'url'
url = "https://hono.dev/llms-full.txt"
filename = "llms-full.txt" # Redundant - same as URL basename
After (slim, clear):
[sources.goreleaser]
repo = "goreleaser/goreleaser"
path = "www/docs"
[sources.hono]
url = "https://hono.dev/llms-full.txt"
Both configs produce identical behavior. The slimmer version relies on type inference and sensible defaults.
Output Layout
Default output root is .dox/:
.dox/
.dox.lock
goreleaser/
hono/
Each source writes into its own directory (source name by default, or out override).
Integrations
dox downloads docs to a local directory that can be indexed by various tools:
AI Code Search & RAG
-
ck-search — Semantic code search powered by embeddings. Index your
.dox/directory for context-aware documentation search.dox sync ck index .dox/ ck search "how to configure goreleaser" -
kit — AI-powered development assistant. Point kit at your synced docs for enhanced context.
dox sync kit configure --docs-path .dox/ -
aichat RAG — Retrieval-Augmented Generation for LLMs. Build a RAG system from your documentation.
dox sync aichat --rag .dox/
Documentation Indexers
-
Codana — Documentation search and indexing:
dox sync codana documents add-collection goreleaser .dox/goreleaser/ codana documents add-collection hono .dox/hono/ codana documents index -
Meilisearch — Fast, typo-tolerant search engine:
dox sync # Index .dox/ directory with Meilisearch -
Typesense — Open-source search engine:
dox sync # Index .dox/ with Typesense for fast doc search
IDE & Editor Extensions
Configure your editor to use .dox/ as a documentation source:
- VSCode: Point Copilot or Codeium context to
.dox/ - Cursor: Add
.dox/to workspace context - Vim/Neovim: Use with coc.nvim or native LSP doc providers
Custom Scripts
Since dox outputs plain files, you can build custom tooling:
# Full-text search
rg "pattern" .dox/
# Convert markdown to HTML
find .dox/ -name "*.md" -exec pandoc -o {}.html {} \;
# Generate embeddings
python embed_docs.py .dox/
Notes
- Config discovery searches for
dox.tomlor.dox.tomlfrom CWD up to filesystem root. - Relative config paths resolve from the config file directory.
- GitHub token resolution order:
github_tokenin configGITHUB_TOKENGH_TOKEN
- Parallelism: Default is 4x CPU cores (minimum 10) since syncing is I/O-bound. Set
max_parallelin config or use--parallelflag to override.
Contributing
Contributions are welcome! See CONTRIBUTING.md for guidelines.
License
MIT