Documentation
¶
Overview ¶
Package graph is the Go port's facade over Kuzu Embedded. It mirrors the responsibilities of the Java GraphStore: open/close an embedded database, run Cypher, bulk-load nodes and edges, and expose read helpers. Writes happen during `enrich`; the `serve`/read-side commands open the same directory in normal (read-write) mode and issue queries.
Concurrency model: the Store owns one Kuzu database and one long-lived connection. All writes funnel through the Store's mutex; reads use the same lock today and may relax to a read-write lock later if profiling demands it. Kuzu's own connection layer is not thread-safe for parallel query execution, so we serialize at this layer.
Index ¶
- Constants
- func MutationKeyword(q string) string
- type OpenOptions
- type Store
- func (s *Store) ApplySchema() error
- func (s *Store) BulkLoadEdges(edges []*model.CodeEdge) error
- func (s *Store) BulkLoadNodes(nodes []*model.CodeNode) error
- func (s *Store) Close() error
- func (s *Store) Conn() *kuzu.Connection
- func (s *Store) Count() (int64, error)
- func (s *Store) CountEdges() (int64, error)
- func (s *Store) CountNodesByKind() (map[string]int64, error)
- func (s *Store) CountNodesByLayer() (map[string]int64, error)
- func (s *Store) CreateIndexes() error
- func (s *Store) Cypher(query string, args ...map[string]any) ([]map[string]any, error)
- func (s *Store) CypherRows(query string, args map[string]any, maxRows int) ([]map[string]any, bool, error)
- func (s *Store) FindByID(id string) (*model.CodeNode, error)
- func (s *Store) FindByKindPaginated(kind string, offset, limit int) ([]*model.CodeNode, error)
- func (s *Store) FindIncomingNeighbors(id string) ([]*model.CodeNode, error)
- func (s *Store) FindOutgoingNeighbors(id string) ([]*model.CodeNode, error)
- func (s *Store) IsReadOnly() bool
- func (s *Store) LoadAllEdges() ([]*model.CodeEdge, error)
- func (s *Store) LoadAllNodes() ([]*model.CodeNode, error)
- func (s *Store) Lock()
- func (s *Store) Path() string
- func (s *Store) SearchByLabel(q string, limit int) ([]*model.CodeNode, error)
- func (s *Store) SearchLexical(q string, limit int) ([]*model.CodeNode, error)
- func (s *Store) Unlock()
Constants ¶
const DefaultBufferPoolBytes uint64 = 2 << 30
DefaultBufferPoolBytes caps Kuzu's buffer pool to 2 GiB by default. kuzu.DefaultSystemConfig() allocates 80% of system RAM (~12 GiB on a 15 GiB host) before any Go-side enrich work runs, leaving insufficient headroom for the in-memory enricher pipeline. 2 GiB is enough for real-world graphs at ~/projects/-scale (~430k nodes / ~300k edges) while keeping the host OOM bar well below ceiling.
const DefaultQueryTimeout = 30 * time.Second
DefaultQueryTimeout matches the Java side's DBMS-level cap (GraphDatabaseSettings.transaction_timeout = 30s in Neo4jConfig). Kuzu accepts the timeout in milliseconds on the Connection.
Variables ¶
This section is empty.
Functions ¶
func MutationKeyword ¶
MutationKeyword returns the first matched blocked keyword in q (with comments stripped), or "" if the query is read-only. Used by the run_cypher MCP tool to reject write queries before they reach Kuzu — belt-and-braces alongside the OpenReadOnly system-flag.
Types ¶
type OpenOptions ¶
type OpenOptions struct {
// BufferPoolBytes caps Kuzu's buffer pool in bytes. Zero -> DefaultBufferPoolBytes.
BufferPoolBytes uint64
// MaxThreads caps Kuzu's per-query parallelism. Zero -> defaultMaxThreads().
MaxThreads uint64
// ReadOnly opens the database in read-only mode.
ReadOnly bool
// QueryTimeout, if > 0, sets the per-query wall-clock timeout.
QueryTimeout time.Duration
}
OpenOptions tunes how Open and OpenReadOnly wire the underlying Kuzu SystemConfig. Zero-valued fields fall back to safe defaults documented alongside each field.
type Store ¶
type Store struct {
// contains filtered or unexported fields
}
Store is the embedded Kuzu graph store facade. It owns one Kuzu database and a single long-lived connection. The zero value is not usable — call Open or OpenReadOnly to construct.
func Open ¶
Open creates or opens a Kuzu database with safe default OpenOptions (capped BufferPoolBytes + MaxThreads). For tuning, see OpenWithOptions.
func OpenReadOnly ¶
OpenReadOnly opens an existing Kuzu store in read-only mode and sets a wall-clock timeout on every Cypher query. queryTimeout matches the Java DBMS-level `transaction_timeout=30s` cap (Neo4jConfig). Configurable via codeiq.yml `mcp.limits.query_timeout`.
All writes from a Store opened this way are rejected at the Cypher gateway (Store.Cypher) before they hit Kuzu — the SDK-level read-only flag protects on-disk state but does not surface a Go error, it just silently no-ops some statements. Belt-and-braces.
queryTimeout <= 0 disables the per-query timeout. Kuzu interprets the timeout in milliseconds; we accept a Go duration for ergonomics.
func OpenWithOptions ¶
func OpenWithOptions(path string, opts OpenOptions) (*Store, error)
OpenWithOptions creates or opens a Kuzu database, applying any non-zero fields of opts. Zero-valued fields fall back to safe defaults — see OpenOptions and DefaultBufferPoolBytes.
func (*Store) ApplySchema ¶
ApplySchema creates the single CodeNode node table plus one REL table per EdgeKind. Idempotent — repeated calls are no-ops via `IF NOT EXISTS`. Mirrors the implicit label-driven schema Spring Data Neo4j gives the Java side; on Kuzu the schema is explicit.
CodeNode is one table backing all 34 NodeKinds — `kind` is a column, not a label. Properties round-trip through a JSON-serialised `props` column plus a small set of first-class columns we want to index / project on.
func (*Store) BulkLoadEdges ¶
BulkLoadEdges groups edges by Kind and issues one COPY FROM per rel table. A mixed-kind batch is split internally — callers don't need to pre-partition. Empty input is a no-op.
func (*Store) BulkLoadNodes ¶
BulkLoadNodes writes nodes to one or more temporary CSV files and ingests them via Kuzu's COPY FROM, in batches of bulkLoadBatchSize. This is materially faster than per-node CREATE for the enrich-phase volumes we hit (44k files / 100k+ nodes). Empty input is a no-op (an empty CSV would still issue a COPY, which Kuzu may reject; the no-op behaviour also matches Java's bulkSave convention).
Each batch is staged + ingested + cleaned up before the next batch starts so that neither the on-disk CSV footprint nor Kuzu's ingest buffer ever holds more than bulkLoadBatchSize rows. Cypher uniqueness constraints are still enforced cross-batch, so a duplicate primary key surfaces the same Copy exception either way.
func (*Store) Close ¶
Close releases the connection and database. Safe to call multiple times; the second and subsequent calls are no-ops.
func (*Store) Conn ¶
func (s *Store) Conn() *kuzu.Connection
Conn returns the underlying Kuzu connection. Callers that need to orchestrate multi-statement work directly against go-kuzu can take this, but they MUST hold s.Lock()/s.Unlock() around the work. For single-shot queries prefer the package helpers (Cypher, etc.) which lock for the caller.
func (*Store) CountEdges ¶
CountEdges returns the total number of edges across every rel table. The anonymous-rel pattern `()-[r]->()` unions all declared rel types in Kuzu — confirmed against the v0.7.1 binder.
func (*Store) CountNodesByKind ¶
CountNodesByKind returns {kind: count} across all 34 NodeKinds. Mirrors StatsService.getKindCounts() on the Java side.
func (*Store) CountNodesByLayer ¶
CountNodesByLayer returns {layer: count} across LayerClassifier output. Mirrors StatsService.getLayerCounts() on the Java side.
func (*Store) CreateIndexes ¶
CreateIndexes installs the fulltext-search indexes the read side relies on. Two indexes are created:
- code_node_label_fts: covers label + fqn_lower. Powers SearchByLabel and the search_graph MCP tool surface.
- code_node_lexical_fts: covers prop_lex_comment + prop_lex_config_keys. Powers LexicalQueryService's doc-comment / config-key search.
Idempotent: existing indexes are dropped before re-create. The enrich pipeline calls this once after BulkLoadNodes / BulkLoadEdges complete, so the indexes always reflect the latest snapshot.
FTS bundled in Kuzu 0.11.3+ (no network install needed — air-gapped safe).
func (*Store) Cypher ¶
Cypher runs a Cypher statement and returns rows as []map[string]any. For DDL or void queries the returned slice may be empty (or contain whatever status row Kuzu emits). If args is supplied the query is prepared and bound; otherwise it is executed directly.
The caller-supplied map is read-only — parameter values are copied through go-kuzu's Execute path.
func (*Store) CypherRows ¶
func (s *Store) CypherRows(query string, args map[string]any, maxRows int) ([]map[string]any, bool, error)
CypherRows runs query, materialises up to maxRows result rows, and reports whether the query produced more rows than the cap. Used by the run_cypher MCP tool which needs to surface a `truncated` flag without inlining `LIMIT N` into the user-supplied query string (the query may already have its own LIMIT — see the McpTools row-cap gotcha in CLAUDE.md).
The mutation gate from Cypher() applies here too: on a read-only store, any blocked-keyword query short-circuits with an error.
func (*Store) FindByID ¶
FindByID returns the single node with primary key id, or (nil, nil) when no such node exists. Mirrors GraphRepository.findById on the Java side.
func (*Store) FindByKindPaginated ¶
FindByKindPaginated returns nodes of the given kind ordered by id with SKIP/LIMIT semantics. Mirrors GraphController's /api/kinds/{kind}. offset / limit must be non-negative; negative input is coerced to 0.
func (*Store) FindIncomingNeighbors ¶
FindIncomingNeighbors returns distinct nodes a where a -[*]-> n.id. Mirrors GraphController's /api/nodes/{id}/neighbors (incoming side). Note: Kuzu 0.7.1's binder drops the rel-pattern scope after `RETURN DISTINCT`, so the ORDER BY must reference the alias (`id`), not `a.id` — the SQL-standard DISTINCT scope behaviour.
func (*Store) FindOutgoingNeighbors ¶
FindOutgoingNeighbors returns distinct nodes b where n.id -[*]-> b. Mirrors GraphController's /api/nodes/{id}/neighbors (outgoing side). Same DISTINCT-scope caveat as FindIncomingNeighbors.
func (*Store) IsReadOnly ¶
IsReadOnly reports whether the store rejects mutating Cypher.
func (*Store) LoadAllEdges ¶
LoadAllEdges pulls every edge from every rel table, hydrating model.CodeEdge. Determinism: rows come out grouped by EdgeKind in declaration order, then sorted by edge id within each kind. Empty graph returns (nil, nil).
func (*Store) LoadAllNodes ¶
LoadAllNodes pulls every CodeNode row out of Kuzu in deterministic ID order and hydrates the columns + the JSON `props` blob back into model.CodeNode. Used by the stats command, which currently re-uses the in-memory StatsService.ComputeStats path rather than per-category Cypher aggregations. On large graphs this is materially heavier than the Java side's TopologyService refactor — see the gotcha in CLAUDE.md for the follow-up plan. Empty graph returns (nil, nil).
func (*Store) Lock ¶
func (s *Store) Lock()
Lock acquires the store mutex. Exposed for callers that drive the connection directly (rare — Cypher / BulkLoad / etc. lock internally).
func (*Store) SearchByLabel ¶
SearchByLabel runs a fulltext search across the label + fqn_lower index. The query is auto-suffixed with '*' to give prefix matching (so 'auth' matches 'AuthService' identifiers). Results are ranked by BM25 score. Falls back to CONTAINS predicate when the FTS index hasn't been built (pre-enrich or enrich aborted before CreateIndexes).
func (*Store) SearchLexical ¶
SearchLexical runs a fulltext search across the prose columns (prop_lex_comment + prop_lex_config_keys). BM25 ranks results. Same CONTAINS fallback as SearchByLabel for pre-enrich graphs.