pluckr

module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 29, 2026 License: MIT

README

pluckr

Local-first, agent-native docs cache.

Pull docs sites, GitHub repos, llms.txt endpoints, or local folders into a markdown cache. Search via SQLite FTS5. Serve to Claude Code, Cursor, and any MCP host.

ci release go reference go report license


Why

Today, LLM agents only have three options for docs:

  • dump entire sites into context — wasteful, expensive, fragile
  • browse live — slow, flaky, blocked by SPAs and rate limits
  • hope training data is fresh — it isn't

pluckr is option four. One static binary keeps a folder of clean markdown plus a tiny SQLite full-text index. An MCP server exposes that cache to any compatible host (Claude Code, Cursor, Claude Desktop, Cline, Zed, …). Your agent searches once, gets exactly the section it needs, and cites the original URL.

See it in action

$ pluckr add https://react.dev/reference --pull
  added react.dev (website) → https://react.dev/reference
  pulling react.dev (website)...
    react.dev: 412 pages, 3,180 chunks, 6 skipped, 0 errors in 18.2s

$ pluckr search "useEffect cleanup"
1. react.dev › Reference › useEffect › Cleaning up an effect
   https://react.dev/reference/react/useEffect#cleaning-up-an-effect
   ... return a [cleanup] function from the Effect ...

$ pluckr mcp     # serve the cache over MCP/stdio

Install

pluckr ships as a single static Go binary. Three install paths today:

# 1. Latest release (recommended)
go install github.com/SarthakShrivastav-a/pluckr/cmd/pluckr@latest

# 2. Specific version
go install github.com/SarthakShrivastav-a/pluckr/cmd/pluckr@v0.2.2

# 3. From source
git clone https://github.com/SarthakShrivastav-a/pluckr.git
cd pluckr && go build -o pluckr ./cmd/pluckr

Requires Go 1.25+. Pre-built binaries for Linux / macOS / Windows on amd64 and arm64 are attached to every GitHub release — download, unzip, drop on PATH.

Quick start

# subscribe to a few sources (kind is detected from the spec)
pluckr add https://react.dev/reference
pluckr add https://docs.python.org/3 --max 200
pluckr add facebook/react/docs              # github
pluckr add https://example.com/llms.txt     # llms.txt convention
pluckr add ~/internal-docs                  # local markdown folder

# fetch + index everything
pluckr pull --all

# search from the CLI
pluckr search "useState"

# serve to your agent
pluckr mcp

Source kinds

Kind Spec What it does
website https://react.dev/reference Sitemap → nav → BFS crawl, fetch each URL, render to clean markdown
llms_txt https://example.com/llms.txt Prefer /llms-full.txt if present; otherwise parse links from /llms.txt
github facebook/react/docs List the repo tree, pull every .md / .markdown / .mdx under the optional subdir
local ~/internal-docs Walk a folder, pick up .md / .markdown / .mdx / .txt. No network

The kind is detected from the spec; pass --kind to override.

Hooking it up to an agent

Install the bundled plugin — MCP server + skill + slash commands + a SessionStart hook in one shot:

/plugin marketplace add SarthakShrivastav-a/pluckr
/plugin install pluckr@pluckr
/reload-plugins

What you get:

  • MCP server with seven tools (search/get/list/outline + refresh/add/remove)
  • Skill that tells Claude when to use the cache before reaching for the web
  • Slash commands/pluckr-add, /pluckr-list, /pluckr-search, /pluckr-refresh
  • SessionStart hook — runs pluckr list so Claude knows what sources exist from message zero

Or manually wire just the MCP server:

claude mcp add pluckr -- pluckr mcp
Cursor

Add to ~/.cursor/mcp.json:

{ "mcpServers": { "pluckr": { "command": "pluckr", "args": ["mcp"] } } }
Claude Desktop

Add to claude_desktop_config.json:

{ "mcpServers": { "pluckr": { "command": "pluckr", "args": ["mcp"] } } }

MCP tool surface

Tool Purpose
search_docs(query, sources?, limit?) BM25 search across subscribed sources. Returns chunks with heading path, snippet, freshness
get_page(source, path) Full markdown of one cached page
list_sources() Subscribed sources with kind, root, page count, last sync, stale flag
get_outline(source) Heading tree of an entire source
refresh_source(name) Re-run the pipeline for one source. Mutating
add_source(spec, …) Subscribe a new source. Mutating
remove_source(name, …) Drop a source from the registry. Mutating

The MCP host's per-call consent UI is the user-facing safety gate for mutating tools. Set PLUCKR_MCP_NO_MUTATIONS=true to disable them server-side.

CLI reference

pluckr add <spec> [--name --kind --refresh --max --pull]
pluckr list
pluckr remove <name> [--keep-files]
pluckr pull [name...] [--all]
pluckr search <query> [--source --limit]
pluckr reindex <name>
pluckr mcp
pluckr root

On-disk layout

~/.pluckr/
  registry.json                 # subscribed sources + freshness + auth refs
  sources/
    react.dev/
      pages/                    # markdown is the source of truth
        reference/hooks/useState.md
      manifest.json             # per-page hash, fetched_at, token_count
      index.db                  # SQLite FTS5 — rebuildable from pages/

Markdown files are the source of truth. Hand-edit them, run pluckr reindex <source>, and the FTS5 index catches up.

Auth for private docs

Headers and cookies expand ${ENV} references at fetch time, so secrets stay out of the registry file:

{
  "name": "internal-confluence",
  "kind": "website",
  "root": "https://wiki.corp.example/spaces/DOCS",
  "headers": { "Cookie": "JSESSIONID=${WIKI_SESSION}" }
}

The refresh field accepts 7d, 30d, manual, or never. The MCP server kicks off background refresh of overdue sources at session start.

Architecture

Small Go interfaces, each implementation isolated in its own package:

fetch    →  render    →  chunk      →  store + retriever  →  mcp / cli
HTTP        HTML→md      heading-       FTS5 (modernc.org/      stdio
            with         bounded,       sqlite, no CGo)
            empty-       800-token
            content      cap
            detection

See docs/design.md for the full v0.1 design — locked decisions, package boundaries, MCP tool semantics, sync model, and explicit out-of-scope items.

Status

Active development. Source kinds, FTS5 search, the MCP read tools, and the CLI all work. Headless rendering for SPA-only docs sites and pluggable vector retrievers are designed-in but not yet built. Every push to main auto-publishes the next semver patch with binaries.

Contributing

See CONTRIBUTING.md. Bug reports, source-kind PRs, renderer improvements, and CLI / MCP polish are all welcome.

License

MIT.

Directories

Path Synopsis
cmd
pluckr command
Command pluckr is the CLI entry point.
Command pluckr is the CLI entry point.
internal
chunk
Package chunk splits a rendered Document into the indexable units used by the retriever.
Package chunk splits a rendered Document into the indexable units used by the retriever.
cli
Package cli implements the pluckr command-line interface using cobra.
Package cli implements the pluckr command-line interface using cobra.
fetch
Package fetch defines a minimal Fetcher abstraction.
Package fetch defines a minimal Fetcher abstraction.
mcp
Package mcp implements the pluckr MCP server.
Package mcp implements the pluckr MCP server.
pipeline
Package pipeline wires Source -> Renderer -> Chunker -> Store + Retriever into a single end-to-end ingest.
Package pipeline wires Source -> Renderer -> Chunker -> Store + Retriever into a single end-to-end ingest.
registry
Package registry owns the user-managed list of subscribed sources.
Package registry owns the user-managed list of subscribed sources.
render
Package render turns the bytes a fetcher returned into a clean Document.
Package render turns the bytes a fetcher returned into a clean Document.
retriever
Package retriever defines the search interface and the value types callers use to drive it.
Package retriever defines the search interface and the value types callers use to drive it.
retriever/fts5
Package fts5 implements retriever.Retriever on top of SQLite FTS5 via the pure-Go modernc.org/sqlite driver.
Package fts5 implements retriever.Retriever on top of SQLite FTS5 via the pure-Go modernc.org/sqlite driver.
source
Package source defines the abstraction over an ingestible thing - a public docs site, an llms.txt endpoint, a GitHub repo, or a local folder - and the helper utilities that every implementation needs.
Package source defines the abstraction over an ingestible thing - a public docs site, an llms.txt endpoint, a GitHub repo, or a local folder - and the helper utilities that every implementation needs.
source/github
Package github implements the github Source kind: ingest the markdown files from a public GitHub repository via the public API (tree listing) and raw.githubusercontent.com (file contents).
Package github implements the github Source kind: ingest the markdown files from a public GitHub repository via the public API (tree listing) and raw.githubusercontent.com (file contents).
source/llmstxt
Package llmstxt implements the llms.txt Source kind, a fast lane for sites that publish the emerging /llms.txt or /llms-full.txt convention (see https://llmstxt.org).
Package llmstxt implements the llms.txt Source kind, a fast lane for sites that publish the emerging /llms.txt or /llms-full.txt convention (see https://llmstxt.org).
source/local
Package local implements the local Source kind: read markdown / text files from a directory tree on disk.
Package local implements the local Source kind: read markdown / text files from a directory tree on disk.
source/website
Package website implements the website Source kind: discover URLs via sitemap, then nav extraction, then BFS link crawl, then fetch each URL through the supplied Fetcher.
Package website implements the website Source kind: discover URLs via sitemap, then nav extraction, then BFS link crawl, then fetch each URL through the supplied Fetcher.
store
Package store owns the on-disk layout of a pluckr cache:
Package store owns the on-disk layout of a pluckr cache:
types
Package types defines the small set of value types that flow through the fetch -> render -> chunk -> index -> serve pipeline.
Package types defines the small set of value types that flow through the fetch -> render -> chunk -> index -> serve pipeline.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL