codamigo

module

v1.0.0 Latest Latest Go to latest Published: May 14, 2026 License: MPL-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ieshan/codamigo

Links

Open Source Insights

README ¶

Codamigo - Code Amigo

Semantic code search for your local machine. codamigo walks a source tree, chunks files into semantically coherent pieces using tree-sitter ASTs, embeds the chunks with a configurable embedding provider, stores them in a local SQLite + sqlite-vec database, and exposes hybrid search (KNN + BM25) via a CLI and an MCP stdio server.

How it works

codamigo runs a four-stage pipeline:

Walk — recursive filesystem walk with .gitignore / .caignore and include/exclude glob filtering
Chunk — tree-sitter AST-based splitting into semantically coherent units (functions, classes, declarations)
Embed — each chunk is converted to a float32 vector via any OpenAI-compatible embedding API
Search — queries are embedded and matched against the store using hybrid KNN + BM25 (Reciprocal Rank Fusion), backed by sqlite-vec and FTS5

The index lives in a single .codamigo/store.db file in your project. No external services are required — any local embedding server (Ollama, LM Studio) works.

Installation

Prerequisites: Go 1.26+ and a C compiler (CGo is required for tree-sitter grammars).

go install github.com/ieshan/codamigo/cmd/codamigo@latest

If cc is not on your PATH, set the CC environment variable to point to your compiler before building.

Quick start

codamigo init            # guided setup — writes ~/.codamigo/global_settings.yml
codamigo index           # walk + chunk + embed + store
codamigo search "query"  # semantic search, prints top 10 results
codamigo map             # structural map of packages, files, and symbols
codamigo serve           # start MCP stdio server for AI assistant integration

Usage modes

Choose the mode that fits your workflow.

Mode A — CLI only

Run index whenever you want to refresh the store, then search interactively. No AI assistant required. Useful for one-off exploration or scripting.

codamigo index
codamigo search "authentication middleware" 5

Mode B — MCP with pre-indexed store

Pre-run index before starting serve (e.g. in CI, as a cron job, or on a git hook). When serve starts it also runs an initial index pass, but if the store is already fresh all files are hash-matched and skipped quickly. Use this when you want indexing managed outside the MCP server lifecycle.

# scheduled or on commit:
codamigo index

# AI assistant launches this (serve re-checks hashes on startup, then watches):
codamigo serve

Mode C — MCP with auto-indexing (recommended)

Run serve once. It performs an initial full index on startup, then watches the filesystem for changes and re-indexes modified files continuously. The MCP refresh_index=true parameter triggers a manual re-index on demand.

# configure your AI assistant to run:
codamigo serve

Decision guide:

Simplest local setup → Mode C
Indexing managed separately (CI/CD, cron, git hooks) → Mode B
No AI assistant, just want semantic grep → Mode A

Commands

`codamigo init`

Guided first-time setup. Prompts for embedding base URL, model, and API key. Writes ~/.codamigo/global_settings.yml and .codamigo/settings.yml (if absent). Appends .codamigo/ to .gitignore. Runs a smoke-test against the embedding model.

No flags. Reads from stdin interactively.

`codamigo index`

Walks the project root, chunks every matched file, embeds each chunk, and upserts records into the store. Skips files whose content hash is unchanged since the last run. Removes records for files that no longer exist on disk.

When stderr is a TTY, a live 2-line progress display shows running processed/skipped counts and the file currently being indexed. The display is suppressed automatically in CI and piped environments.

Common flags (shared by all commands):

Flag	Env var	Default	Purpose
`--api-key`	`CODAMIGO_API_KEY`	—	Embedding API key
`--model`	`CODAMIGO_MODEL`	`text-embedding-3-small`	Embedding model name
`--base-url`	`CODAMIGO_BASE_URL`	`https://api.openai.com/v1`	Embedding API base URL
`--store-path`	`CODAMIGO_STORE_PATH`	`.codamigo/store.db`	SQLite store file path
`--project-root`	`CODAMIGO_PROJECT_ROOT`	current directory	Root directory to index
`--dimensions`	`CODAMIGO_DIMENSIONS`	`1536`	Embedding vector dimensions
`--global-config`	`CODAMIGO_GLOBAL_CONFIG`	`~/.codamigo/global_settings.yml`	Path to global config
`--project-config`	`CODAMIGO_PROJECT_CONFIG`	`.codamigo/settings.yml`	Path to project config

`codamigo search <query> [limit]`

Embeds the query and runs hybrid KNN + BM25 search. Prints results as score filepath:startLine-endLine [language] followed by the chunk content.

Additional flags:

Flag	Default	Purpose
`--limit`	`10`	Maximum results to return
`--offset`	`0`	Results to skip (pagination)
`--lang`	—	Filter by language, repeatable: `--lang go --lang python`
`--path`	—	Filter by file path glob, repeatable: `--path 'cmd/**'`
`--max-tokens`	`0`	Token budget for results (0 = no limit)
`--package`	—	Filter results to a package (e.g. `--package store`)
`--name`	—	Filter by symbol name
`--node-kind`	—	Filter by AST node kind, repeatable
`--metadata-only`	`false`	Return only file paths, line numbers, and symbol names

limit can also be passed as a second positional argument: codamigo search "auth" 5.

`codamigo map`

Prints a structural map of the indexed codebase showing packages, files, and symbol names. Useful for orientation before searching. Built entirely from stored data — no embedding API calls.

By default, the map excludes configured non-code files (default: markdown, yaml, json), shows line ranges on symbols, includes per-file type summaries, and marks exported/internal symbols.

Flag	Default	Purpose
`--max-tokens`	`2000`	Token budget for the map output
`--no-code-only`	`false`	Include configured non-code language files in the map
`--no-summary`	`false`	Hide per-file type summary from file headers
`--no-visibility`	`false`	Hide export/visibility markers from symbols

`codamigo serve`

Starts the MCP stdio server. On startup it runs a full index pass, then launches a background filesystem watcher that re-indexes changed files. Accepts MCP tool calls from an AI assistant over stdin/stdout.

MCP tool: search

query (string) — the search text
limit (int, default 10) — how many results to return
languages (array, optional) — filter by programming language
paths (array, optional) — glob patterns to restrict search scope
max_tokens (int, default 0) — token budget for results (0 = no limit)
package (string, optional) — filter to a package name
refresh_index (bool, default false) — trigger a full re-index before searching
name (string, optional) — filter results to chunks matching this symbol name
node_kinds (array, optional) — filter by AST node kind (e.g. ["function_declaration"])
metadata_only (bool, default false) — return only file paths, line numbers, and symbol names (no source content)
offset (int, default 0) — number of results to skip for pagination

MCP tool: get_map

max_tokens (int, default 2000) — token budget for the map output
code_only (bool, default true) — exclude configured non-code languages from the map
show_summary (bool, default true) — show per-file type summary in file headers
show_visibility (bool, default true) — show export markers (+ public, - internal)

Uses the same common flags as index.

`codamigo reset`

Deletes the vector store database file. Prompts for confirmation unless --force is passed.

Flag	Purpose
`--force`	Skip the confirmation prompt

`codamigo doctor`

Diagnoses configuration, store health, and embedding model reachability. Reports: global config, project config, store file existence, index stats (chunks, files, per-language counts), walker file count, embedding smoke-test.

Flag	Purpose
`--quick`	Skip the live embedding smoke-test

Configuration

Configuration is loaded in four layers — later layers win:

built-in defaults
  → ~/.codamigo/global_settings.yml   (shared across all projects)
    → .codamigo/settings.yml          (per-project; safe to commit)
      → environment variables
        → CLI flags

codamigo init writes the global file. The project file holds project-specific patterns. Both are YAML.

Full config reference:

# Embedding provider
embedding_provider: openai            # informational label only
embedding_model: text-embedding-3-small
embedding_api_key: sk-...             # use CODAMIGO_API_KEY env var instead
embedding_base_url: https://api.openai.com/v1
embedding_dimensions: 1536
embedding_index_input_type: ""        # e.g. "document" for Voyage AI
embedding_query_input_type: ""        # e.g. "query" for Voyage AI

# Rate limiting and retries
embedding_max_batch_size: 256
embedding_rate_limit: 500.0           # sustained requests/second
embedding_rate_burst: 100             # max burst above sustained rate
embedding_max_retries: 3
embedding_retry_base_delay: "500ms"   # e.g. "500ms", "1s"

# File filtering
include_patterns: []                  # empty = include all matched extensions
exclude_patterns: []                  # gitignore rules are also applied

# Map display
non_code_languages:           # languages excluded by code_only filter
  - markdown                  # default: ["markdown", "yaml", "json"]
  - yaml
  - json

# Storage
store_path: .codamigo/store.db

# Project
project_root: ""                      # defaults to current working directory

# Indexing
index_concurrency: 20                 # files processed concurrently during indexing
max_file_size: 1048576                # skip files larger than this (bytes); 0 = no limit
write_batch_size: 50                  # files per DB write transaction during batch indexing; 0 = use default (50)

# File watching (serve only)
watch_mode: auto                      # "auto" | "fsnotify" | "poll"
poll_interval: "5s"
debounce_window: "500ms"

Keep embedding_api_key in the global config (written with mode 0600 by init) or in CODAMIGO_API_KEY. Do not put API keys in the project config.

Embedding providers

OpenAI

# ~/.codamigo/global_settings.yml
embedding_base_url: https://api.openai.com/v1
embedding_model: text-embedding-3-small
embedding_dimensions: 1536

export CODAMIGO_API_KEY=sk-...

Models: text-embedding-3-small (fast, 1536 dims), text-embedding-3-large (higher quality, 3072 dims).

Voyage AI

Voyage uses input_type to distinguish document vs. query vectors, which improves retrieval quality.

# ~/.codamigo/global_settings.yml
embedding_base_url: https://api.voyageai.com/v1
embedding_model: voyage-code-3
embedding_dimensions: 1024
embedding_index_input_type: document
embedding_query_input_type: query

export CODAMIGO_API_KEY=pa-...

Ollama (local)

No API key required. Pull a model first:

ollama pull nomic-embed-text

# ~/.codamigo/global_settings.yml
embedding_base_url: http://localhost:11434/v1
embedding_model: nomic-embed-text
embedding_dimensions: 768
embedding_rate_limit: 50
embedding_rate_burst: 10

Ollama requires a non-empty Authorization header; set any placeholder value:

export CODAMIGO_API_KEY=ollama

Good models: nomic-embed-text (768 dims), mxbai-embed-large (1024 dims).

LM Studio (local)

Enable Local Server in LM Studio and load an embedding model, then:

# ~/.codamigo/global_settings.yml
embedding_base_url: http://localhost:1234/v1
embedding_model: <model-id-shown-in-lm-studio>
embedding_dimensions: <see model card>
embedding_rate_limit: 20
embedding_rate_burst: 5

export CODAMIGO_API_KEY=lm-studio

Check the model card for embedding_dimensions. A mismatch between the stored value and the configured value causes an error on the second index run — the store enforces model consistency.

MCP integration

codamigo speaks the MCP stdio protocol. Configure your AI assistant to launch codamigo serve as a stdio MCP server. The server indexes on startup and keeps the index fresh via filesystem watching.

Claude Code

Add to ~/.claude/settings.json (global) or .claude/settings.json (project):

{
  "mcpServers": {
    "codamigo": {
      "command": "codamigo",
      "args": ["serve"],
      "env": {
        "CODAMIGO_API_KEY": "<your-api-key>"
      }
    }
  }
}

If your API key is already in ~/.codamigo/global_settings.yml, the env block can be omitted.

The tools are available in Claude as mcp__codamigo__search and mcp__codamigo__get_map.

OpenAI Codex

Add to ~/.codex/config.toml (global) or codex.toml (project):

[[mcp_servers]]
name = "codamigo"
command = "codamigo"
args = ["serve"]

[mcp_servers.env]
CODAMIGO_API_KEY = "<your-api-key>"

Tip: For large repos, run codamigo index once before starting your AI session. When serve starts it re-checks all files, but if the store is already fresh the pass completes in seconds.

Using with AI coding agents

codamigo is designed to be used as an MCP server by AI coding agents such as Claude Code, OpenAI Codex, Cursor, Windsurf, and others. Once codamigo serve is running, the agent has access to two tools:

Tool	Purpose
`search`	Semantic search — embed a query and return matching code chunks
`get_map`	Structural overview — packages, files, and symbol names from the index

Recommended workflow

1. Orient with get_map first. Before searching, call get_map (with a reasonable max_tokens budget, e.g. 2000) to get a structural overview of the codebase. This shows which packages exist, how many symbols each contains, and what the key files are. Use this to decide which package or file to scope your search to.

2. Search semantically. Use natural-language queries rather than exact symbol names. The hybrid KNN + BM25 index understands intent, not just keywords. "parse config file" will find the config loading logic even if the function is called Load.

3. Scope searches to reduce noise. Narrow results with the available filters:

package — restrict to one package, e.g. "store" or "embedder/openaicompat"
languages — e.g. ["go"] to skip test fixtures in other languages
node_kinds — e.g. ["function_declaration", "method_declaration"] to see only functions
name — exact symbol lookup, e.g. "NewChunker"

4. Use metadata_only for exploratory queries. When you want to find which files or functions are relevant without reading their full source, set metadata_only=true. Results include file path, line numbers, and symbol name but omit the source text — typically 10–20× fewer tokens. Follow up with a targeted search (or a direct file read) once you've identified the right symbols.

5. Control context budget with max_tokens. For agents with limited context windows, set max_tokens to cap the total tokens returned. Results are ranked by relevance and truncated at the budget; a truncated: true flag signals that more results exist.

6. Refresh the index when needed. Set refresh_index=true on a search call to trigger a full re-index before querying. A 30-second cooldown prevents hammering the embedder on rapid back-to-back calls. Alternatively, run codamigo index from the shell.

Example agent queries (Claude Code)

# Overview first — all features enabled by default
mcp__codamigo__get_map(max_tokens=3000)

# Overview without visibility markers
mcp__codamigo__get_map(max_tokens=3000, show_visibility=false)

# Include non-code files (markdown, yaml, etc.)
mcp__codamigo__get_map(max_tokens=3000, code_only=false)

# Find all functions related to embedding
mcp__codamigo__search(query="embedding API request", package="embedder/openaicompat", node_kinds=["function_declaration"])

# Look up a specific symbol
mcp__codamigo__search(query="walk directory tree", name="Walk", metadata_only=true)

# Scan store package cheaply
mcp__codamigo__search(query="upsert chunk records", package="store", metadata_only=true, limit=20)

Node kind reference

Common values for the node_kinds filter:

Value	Matches
`function_declaration`	Go `func` at package level
`method_declaration`	Go `func` on a receiver
`type_declaration`	Go `type` block
`function_definition`	Python / C / C++ functions
`class_definition`	Python classes
`class_declaration`	TypeScript / Java classes
`method_definition`	JS / TS / Ruby methods

Run codamigo map to see which node kinds appear in your indexed codebase.

Supported languages

Language	Extensions
Go	`.go`
Python	`.py`, `.pyw`
JavaScript	`.js`, `.mjs`, `.cjs`, `.jsx`
TypeScript	`.ts`, `.mts`
TSX	`.tsx`
Ruby	`.rb`
C	`.c`, `.h`
C++	`.cpp`, `.cc`, `.cxx`, `.hpp`
Bash	`.sh`, `.bash`
HTML	`.html`, `.htm`
CSS	`.css`
Markdown	`.md`, `.markdown`
JSON	`.json`
YAML	`.yaml`, `.yml`
Vue	`.vue`

Use include_patterns and exclude_patterns in your project config to control which files are indexed.

`.caignore`

codamigo supports a .caignore file that works exactly like .gitignore but is specific to codamigo. Files matched by either .gitignore or .caignore are excluded from indexing and file watching.

Why use `.caignore`?

Your .gitignore controls what Git tracks. Sometimes you want codamigo to skip files that Git still tracks — large generated files, vendored dependencies, test fixtures, or data files that add noise to search results. .caignore lets you tune codamigo's scope without touching .gitignore.

Syntax

.caignore uses identical syntax to .gitignore:

# Ignore all CSV data files
*.csv

# Ignore the testdata directory
testdata/

# But keep the golden files
!testdata/golden/

Behavior

Same directory scoping as .gitignore. A .caignore in src/ applies only to paths under src/, just like a nested .gitignore.
.caignore rules win on conflict. Both files are loaded per directory (.gitignore first, then .caignore). The "last matching rule wins" semantics mean .caignore takes precedence.
Negation works across files. A !pattern in .caignore can re-include a path that .gitignore excludes.
Either file is optional. A directory with only .caignore (no .gitignore) works. A directory with only .gitignore works as before.

Examples

Exclude large generated files from the index while keeping them in Git:

# .caignore
generated/
*.pb.go
*.min.js

Re-include a directory that .gitignore excludes (useful for vendored code you want searchable):

# .gitignore
vendor/

# .caignore — override .gitignore for codamigo
!vendor/

Scope exclusions to a subdirectory by placing .caignore there:

# frontend/.caignore — only affects frontend/
node_modules/
dist/
*.bundle.js

Directories ¶

Path	Synopsis
chunker Package chunker splits source files into semantically coherent chunks using tree-sitter ASTs.	Package chunker splits source files into semantically coherent chunks using tree-sitter ASTs.
cmd
codamigo command
config Package config owns the unified runtime configuration for codamigo.	Package config owns the unified runtime configuration for codamigo.
embedder Package embedder defines the Embedder interface for converting text into float32 embedding vectors.	Package embedder defines the Embedder interface for converting text into float32 embedding vectors.
openaicompat Package openaicompat provides an OpenAI-compatible embedding API client that implements [embedder.Embedder].	Package openaicompat provides an OpenAI-compatible embedding API client that implements [embedder.Embedder].
indexer Package indexer orchestrates the walk → chunk → embed → store pipeline.	Package indexer orchestrates the walk → chunk → embed → store pipeline.
langs Package langs provides chunker.LanguageConfig definitions for the 15 languages built into codamigo.	Package langs provides chunker.LanguageConfig definitions for the 15 languages built into codamigo.
mcp Package mcp exposes codamigo's search capability as an MCP (Model Context Protocol) stdio server.	Package mcp exposes codamigo's search capability as an MCP (Model Context Protocol) stdio server.
query Package query implements semantic search from a caller's perspective.	Package query implements semantic search from a caller's perspective.
store Package store owns the persistence layer for codamigo: the Record and SearchQuery types, the Store interface, and a sqlite-vec implementation.	Package store owns the persistence layer for codamigo: the Record and SearchQuery types, the Store interface, and a sqlite-vec implementation.
walker Package walker walks a directory tree yielding file paths that pass ignore rules (.gitignore and .caignore) and include/exclude glob filters from config.Config.	Package walker walks a directory tree yielding file paths that pass ignore rules (.gitignore and .caignore) and include/exclude glob filters from config.Config.
watcher Package watcher monitors a directory tree for filesystem changes and delivers batched, debounced events to callers.	Package watcher monitors a directory tree for filesystem changes and delivers batched, debounced events to callers.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL