code-knowledge-vector

module
v0.0.0-...-78728ec Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 16, 2026 License: AGPL-3.0

README

Code Knowledge Vector (CKV)

Semantic code search over a local vector index. CKV indexes a source repository as embedding vectors at function / type / heading granularity, stores them in an embedded SQLite + sqlite-vec database, and serves retrieval over a CLI, an in-process Go API, and an MCP server. The companion project code-knowledge-graph (CKG) provides symbol-graph search; the two are designed to be combined by larger systems (CKS) for hybrid retrieval.

Resuming work on a different machine or in a new session? Start with docs/session-handoff-2026-05-23.md — it carries the prereq checklist, env-var matrix, current decision state, and the next-Wave entry conditions in a single document.

Features

  • Languages: Go (go/parser), TypeScript / TSX, JavaScript / JSX / MJS / CJS, Solidity, Markdown.
  • Embedders: mock (no system dependencies, deterministic feature-hash) and bgeonnx (ONNX Runtime + HuggingFace tokenizers, BERT-class models).
  • CLI: build, query, eval, freshness, mcp, model.
  • MCP server: stdio JSON-RPC. Tools: cks.context.semantic_search, cks.ops.health, cks.ops.warmup, cks.ops.get_freshness. Every response carries a top-level schema_version.
  • Go API: import github.com/0xmhha/code-knowledge-vector/pkg/ckv for Open / SemanticSearch / Warmup / Manifest / Close in the calling process.
  • Operational: host memory pre-check + adaptive batching (CKV_MEM_GUARD), CoreML execution provider tuning on macOS (CKV_COREML_*), ORT thread overrides, panic-safe MCP middleware.

Quickstart

CLI with the mock embedder (no system dependencies)
make build
./bin/ckv build --src /path/to/repo --out ./ckv-data
./bin/ckv query "TCP socket bind on port" --out ./ckv-data
CLI with bgeonnx (real semantic embeddings)

Requires libonnxruntime, libtokenizers.a, and a downloaded model. See docs/d1-installation-guide.md.

CGO_LDFLAGS="-L$HOME/lib" go build -tags bgeonnx -o ./bin/ckv ./cmd/ckv
./bin/ckv build --embedder bgeonnx --src /path/to/repo --out ./ckv-data
./bin/ckv query "..." --embedder bgeonnx --out ./ckv-data
In-process Go API
import (
    "context"

    "github.com/0xmhha/code-knowledge-vector/pkg/ckv"
)

func search() error {
    engine, err := ckv.Open(".ckv-data", ckv.OpenOptions{
        Embedder: ckv.MockEmbedder(),
    })
    if err != nil {
        return err
    }
    defer engine.Close()

    if err := engine.Warmup(context.Background()); err != nil {
        // log and continue; first query will pay the cost instead
    }

    resp, err := engine.SemanticSearch(context.Background(),
        "TCP socket bind on port",
        ckv.SearchOptions{K: 5})
    if err != nil {
        return err
    }
    _ = resp.Hits // []ckv.Hit — citation, snippet, score per result
    return nil
}

See docs/embedder-integration.md for the production embedder path, environment overrides, and migration off subprocess MCP.

MCP server
./bin/ckv mcp --out ./ckv-data

Speaks MCP JSON-RPC over stdio. Register with Claude Code:

claude mcp add ckv --command "$(pwd)/bin/ckv mcp --out=$(pwd)/ckv-data"

Supported languages

Language Parser Extensions
Go go/parser .go
TypeScript tree-sitter .ts, .tsx
JavaScript tree-sitter (via TS grammar) .js, .jsx, .mjs, .cjs
Solidity tree-sitter .sol
Markdown heading-section chunks .md, .markdown

Embedders

Backend Build tag System deps Use case
mock none (default) none tests, smoke checks — no semantic signal
bgeonnx -tags bgeonnx libonnxruntime, libtokenizers.a, model files production semantic search

The bgeonnx registry contains two model configs: bge-large-en-v1.5 (default, BERT-class, 1024 dim) and embeddinggemma-300m (Gemma-class, 768 dim). Model files live under ~/.cache/ckv/models/<name>/. The Gemma config is registered; the weights are not bundled with this repository.

Architecture

ckv build   discover ── parse ── chunk ── embed ── sqlite-vec
                                                    │
                                                    └─ manifest.json
ckv query   embed(intent) ── store.Search ── citation enforce ── snippet ── top-K
ckv mcp     JSON-RPC stdio ── cks.context.* / cks.ops.*
pkg/ckv     Engine wrapper around internal/query (in-process consumers)

Build requirements

  • Go 1.25+
  • CGO enabled (for sqlite-vec via mattn/go-sqlite3)
  • gcc or clang toolchain
  • libonnxruntime and libtokenizers.a only when building with -tags bgeonnx

Documentation

License

AGPL-3.0. See LICENSE.

Directories

Path Synopsis
cmd
ckv command
internal
build
Package build is the indexer orchestrator: discover → parse → chunk → embed → store.
Package build is the indexer orchestrator: discover → parse → chunk → embed → store.
chunk
Package chunk turns ([]parse.SymbolSpan, source) into ([]types.Chunk) — the records the embedder + vector store actually persist.
Package chunk turns ([]parse.SymbolSpan, source) into ([]types.Chunk) — the records the embedder + vector store actually persist.
ckgalign
Package ckgalign builds an in-memory index from a CKG SQLite store (graph.db) and resolves each CKV chunk's CKGNodeID by matching (file_path, start_line) — exact start-line preferred, then smallest containing line range.
Package ckgalign builds an in-memory index from a CKG SQLite store (graph.db) and resolves each CKV chunk's CKGNodeID by matching (file_path, start_line) — exact start-line preferred, then smallest containing line range.
convention
Package convention computes per-package AST statistics that describe the package's prevailing idioms — error handling style, logging library, naming patterns, concurrency primitives.
Package convention computes per-package AST statistics that describe the package's prevailing idioms — error handling style, logging library, naming patterns, concurrency primitives.
discover
Package discover walks --src and yields the source files CKV should index.
Package discover walks --src and yields the source files CKV should index.
embed/bgeonnx
Package bgeonnx is the production Embedder backend running ONNX models locally via CGO.
Package bgeonnx is the production Embedder backend running ONNX models locally via CGO.
embed/cache
Package cache wraps a types.Embedder with a hot-path LRU cache so repeated embed calls on the same text return without paying the model cost.
Package cache wraps a types.Embedder with a hot-path LRU cache so repeated embed calls on the same text return without paying the model cost.
embed/convert
Package convert wraps external model conversion tools (optimum-cli, coremltools) as subprocess calls.
Package convert wraps external model conversion tools (optimum-cli, coremltools) as subprocess calls.
embed/coreml
Package coreml provides an Embedder that runs models directly via Apple's CoreML framework, bypassing ONNX Runtime.
Package coreml provides an Embedder that runs models directly via Apple's CoreML framework, bypassing ONNX Runtime.
embed/mock
Package mock is a deterministic, dependency-free Embedder used for integration tests, the dev-loop, and CI before the real ONNX adapter lands.
Package mock is a deterministic, dependency-free Embedder used for integration tests, the dev-loop, and CI before the real ONNX adapter lands.
embed/model
Package model manages embedding model files: download, cache directory resolution, and format conversion.
Package model manages embedding model files: download, cache directory resolution, and format conversion.
embed/registry
Package registry holds the model configuration catalog.
Package registry holds the model configuration catalog.
eval
Package eval scores ckv against a known-query fixture.
Package eval scores ckv against a known-query fixture.
eval/prregress
Package prregress implements PR-based regression evaluation: given a merged PR, check out the world *before* it landed, build a ckv index over that snapshot, hand the PR's Background to an agent, and compare the agent's plan against what the PR actually did.
Package prregress implements PR-based regression evaluation: given a merged PR, check out the world *before* it landed, build a ckv index over that snapshot, hand the PR's Background to an agent, and compare the agent's plan against what the PR actually did.
filter
Package filter implements the Sensitive Filter engine.
Package filter implements the Sensitive Filter engine.
filterlist
Package filterlist implements the --files-from JSON include/exclude allowlist for ckv build.
Package filterlist implements the --files-from JSON include/exclude allowlist for ckv build.
footprint
Package footprint records structured events about every CKV operation — build, query, MCP tool call — to two sinks:
Package footprint records structured events about every CKV operation — build, query, MCP tool call — to two sinks:
freshness
Package freshness compares an index's manifest against the live git HEAD of its source tree.
Package freshness compares an index's manifest against the live git HEAD of its source tree.
glossary
Package glossary auto-extracts korean → english keyword mappings from markdown documents (typically a project's .claude/docs/ tree) and emits the AliasMap YAML that `ckv query --alias` consumes.
Package glossary auto-extracts korean → english keyword mappings from markdown documents (typically a project's .claude/docs/ tree) and emits the AliasMap YAML that `ckv query --alias` consumes.
invariant
Package invariant extracts policy-bearing statements from Go source files in a three-tier confidence ladder:
Package invariant extracts policy-bearing statements from Go source files in a three-tier confidence ladder:
manifest
Package manifest is the on-disk index metadata.
Package manifest is the on-disk index metadata.
parse
Package parse extracts symbol-level spans (functions, methods, types) from source files so the chunker can build embeddable chunks.
Package parse extracts symbol-level spans (functions, methods, types) from source files so the chunker can build embeddable chunks.
parse/fuzzcheck
Package fuzzcheck provides shared invariant checks for parser fuzz tests.
Package fuzzcheck provides shared invariant checks for parser fuzz tests.
parse/golang
Package golang parses Go source via the stdlib go/parser+go/ast.
Package golang parses Go source via the stdlib go/parser+go/ast.
parse/javascript
Package javascript parses .js / .jsx / .mjs / .cjs source files.
Package javascript parses .js / .jsx / .mjs / .cjs source files.
parse/markdown
Package markdown parses *.md / *.markdown files into heading-level SymbolSpans so docs/ADR content becomes searchable alongside source code.
Package markdown parses *.md / *.markdown files into heading-level SymbolSpans so docs/ADR content becomes searchable alongside source code.
parse/prdoc
Package prdoc parses PR descriptions and commit messages into chunks for the PR corpus index.
Package prdoc parses PR descriptions and commit messages into chunks for the PR corpus index.
parse/solidity
Package solidity parses .sol via the vendored tree-sitter-solidity grammar (see internal/parse/solidity/binding).
Package solidity parses .sol via the vendored tree-sitter-solidity grammar (see internal/parse/solidity/binding).
parse/solidity/binding
Package binding wraps tree-sitter-solidity (vendored from github.com/JoranHonig/tree-sitter-solidity v1.2.13, MIT-licensed — see ./LICENSE) into a *sitter.Language for go-tree-sitter.
Package binding wraps tree-sitter-solidity (vendored from github.com/JoranHonig/tree-sitter-solidity v1.2.13, MIT-licensed — see ./LICENSE) into a *sitter.Language for go-tree-sitter.
parse/typescript
Package typescript parses .ts and .tsx source via tree-sitter.
Package typescript parses .ts and .tsx source via tree-sitter.
policy
Package policy loads project-specific category + ModificationGuidance rules from a YAML file and applies them to chunks during build/reindex.
Package policy loads project-specific category + ModificationGuidance rules from a YAML file and applies them to chunks during build/reindex.
projectcfg
Package projectcfg loads <src>/ckv.yaml — the per-project hook for customizing how CKV indexes a repository.
Package projectcfg loads <src>/ckv.yaml — the per-project hook for customizing how CKV indexes a repository.
query
Package query is the read path: open an index built by internal/build and serve semantic_search.
Package query is the read path: open an index built by internal/build and serve semantic_search.
query/bm25
Package bm25 provides candidate-set BM25 rerank for CKV's query path.
Package bm25 provides candidate-set BM25 rerank for CKV's query path.
store/sqlitevec
Package sqlitevec is the default CKV VectorStore implementation — SQLite + the sqlite-vec extension's vec0 virtual table.
Package sqlitevec is the default CKV VectorStore implementation — SQLite + the sqlite-vec extension's vec0 virtual table.
pkg
ckv
Package ckv is the stable, in-process Go API to a ckv vector index.
Package ckv is the stable, in-process Go API to a ckv vector index.
embed/ollama
Package ollama implements the Embedder interface via Ollama's HTTP API.
Package ollama implements the Embedder interface via Ollama's HTTP API.
mcp
Package mcp wraps the CKV read-only surface as an MCP server.
Package mcp wraps the CKV read-only surface as an MCP server.
types
Package types holds the cross-package data contracts: Chunk, Hit, Filter, the Embedder and VectorStore interfaces.
Package types holds the cross-package data contracts: Chunk, Hit, Filter, the Embedder and VectorStore interfaces.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL