graph

package
v0.4.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 14, 2026 License: MIT Imports: 13 Imported by: 0

Documentation

Overview

Package graph is the Go port's facade over Kuzu Embedded. It mirrors the responsibilities of the Java GraphStore: open/close an embedded database, run Cypher, bulk-load nodes and edges, and expose read helpers. Writes happen during `enrich`; the `serve`/read-side commands open the same directory in normal (read-write) mode and issue queries.

Concurrency model: the Store owns one Kuzu database and one long-lived connection. All writes funnel through the Store's mutex; reads use the same lock today and may relax to a read-write lock later if profiling demands it. Kuzu's own connection layer is not thread-safe for parallel query execution, so we serialize at this layer.

Index

Constants

View Source
const DefaultBufferPoolBytes uint64 = 2 << 30

DefaultBufferPoolBytes caps Kuzu's buffer pool to 2 GiB by default. kuzu.DefaultSystemConfig() allocates 80% of system RAM (~12 GiB on a 15 GiB host) before any Go-side enrich work runs, leaving insufficient headroom for the in-memory enricher pipeline. 2 GiB is enough for real-world graphs at ~/projects/-scale (~430k nodes / ~300k edges) while keeping the host OOM bar well below ceiling.

View Source
const DefaultQueryTimeout = 30 * time.Second

DefaultQueryTimeout matches the Java side's DBMS-level cap (GraphDatabaseSettings.transaction_timeout = 30s in Neo4jConfig). Kuzu accepts the timeout in milliseconds on the Connection.

Variables

This section is empty.

Functions

func MutationKeyword

func MutationKeyword(q string) string

MutationKeyword returns the first matched blocked keyword in q (with comments stripped), or "" if the query is read-only. Used by the run_cypher MCP tool to reject write queries before they reach Kuzu — belt-and-braces alongside the OpenReadOnly system-flag.

Types

type OpenOptions

type OpenOptions struct {
	// BufferPoolBytes caps Kuzu's buffer pool in bytes. Zero -> DefaultBufferPoolBytes.
	BufferPoolBytes uint64
	// MaxThreads caps Kuzu's per-query parallelism. Zero -> defaultMaxThreads().
	MaxThreads uint64
	// ReadOnly opens the database in read-only mode.
	ReadOnly bool
	// QueryTimeout, if > 0, sets the per-query wall-clock timeout.
	QueryTimeout time.Duration
}

OpenOptions tunes how Open and OpenReadOnly wire the underlying Kuzu SystemConfig. Zero-valued fields fall back to safe defaults documented alongside each field.

type Store

type Store struct {
	// contains filtered or unexported fields
}

Store is the embedded Kuzu graph store facade. It owns one Kuzu database and a single long-lived connection. The zero value is not usable — call Open or OpenReadOnly to construct.

func Open

func Open(path string) (*Store, error)

Open creates or opens a Kuzu database with safe default OpenOptions (capped BufferPoolBytes + MaxThreads). For tuning, see OpenWithOptions.

func OpenReadOnly

func OpenReadOnly(path string, queryTimeout time.Duration) (*Store, error)

OpenReadOnly opens an existing Kuzu store in read-only mode and sets a wall-clock timeout on every Cypher query. queryTimeout matches the Java DBMS-level `transaction_timeout=30s` cap (Neo4jConfig). Configurable via codeiq.yml `mcp.limits.query_timeout`.

All writes from a Store opened this way are rejected at the Cypher gateway (Store.Cypher) before they hit Kuzu — the SDK-level read-only flag protects on-disk state but does not surface a Go error, it just silently no-ops some statements. Belt-and-braces.

queryTimeout <= 0 disables the per-query timeout. Kuzu interprets the timeout in milliseconds; we accept a Go duration for ergonomics.

func OpenWithOptions

func OpenWithOptions(path string, opts OpenOptions) (*Store, error)

OpenWithOptions creates or opens a Kuzu database, applying any non-zero fields of opts. Zero-valued fields fall back to safe defaults — see OpenOptions and DefaultBufferPoolBytes.

func (*Store) ApplySchema

func (s *Store) ApplySchema() error

ApplySchema creates the single CodeNode node table plus one REL table per EdgeKind. Idempotent — repeated calls are no-ops via `IF NOT EXISTS`. Mirrors the implicit label-driven schema Spring Data Neo4j gives the Java side; on Kuzu the schema is explicit.

CodeNode is one table backing all 34 NodeKinds — `kind` is a column, not a label. Properties round-trip through a JSON-serialised `props` column plus a small set of first-class columns we want to index / project on.

func (*Store) BulkLoadEdges

func (s *Store) BulkLoadEdges(edges []*model.CodeEdge) error

BulkLoadEdges groups edges by Kind and issues one COPY FROM per rel table. A mixed-kind batch is split internally — callers don't need to pre-partition. Empty input is a no-op.

func (*Store) BulkLoadNodes

func (s *Store) BulkLoadNodes(nodes []*model.CodeNode) error

BulkLoadNodes writes nodes to one or more temporary CSV files and ingests them via Kuzu's COPY FROM, in batches of bulkLoadBatchSize. This is materially faster than per-node CREATE for the enrich-phase volumes we hit (44k files / 100k+ nodes). Empty input is a no-op (an empty CSV would still issue a COPY, which Kuzu may reject; the no-op behaviour also matches Java's bulkSave convention).

Each batch is staged + ingested + cleaned up before the next batch starts so that neither the on-disk CSV footprint nor Kuzu's ingest buffer ever holds more than bulkLoadBatchSize rows. Cypher uniqueness constraints are still enforced cross-batch, so a duplicate primary key surfaces the same Copy exception either way.

func (*Store) Close

func (s *Store) Close() error

Close releases the connection and database. Safe to call multiple times; the second and subsequent calls are no-ops.

func (*Store) Conn

func (s *Store) Conn() *kuzu.Connection

Conn returns the underlying Kuzu connection. Callers that need to orchestrate multi-statement work directly against go-kuzu can take this, but they MUST hold s.Lock()/s.Unlock() around the work. For single-shot queries prefer the package helpers (Cypher, etc.) which lock for the caller.

func (*Store) Count

func (s *Store) Count() (int64, error)

Count returns the total number of CodeNode rows.

func (*Store) CountEdges

func (s *Store) CountEdges() (int64, error)

CountEdges returns the total number of edges across every rel table. The anonymous-rel pattern `()-[r]->()` unions all declared rel types in Kuzu — confirmed against the v0.7.1 binder.

func (*Store) CountNodesByKind

func (s *Store) CountNodesByKind() (map[string]int64, error)

CountNodesByKind returns {kind: count} across all 34 NodeKinds. Mirrors StatsService.getKindCounts() on the Java side.

func (*Store) CountNodesByLayer

func (s *Store) CountNodesByLayer() (map[string]int64, error)

CountNodesByLayer returns {layer: count} across LayerClassifier output. Mirrors StatsService.getLayerCounts() on the Java side.

func (*Store) CreateIndexes

func (s *Store) CreateIndexes() error

CreateIndexes installs the fulltext-search indexes the read side relies on. Two indexes are created:

  • code_node_label_fts: covers label + fqn_lower. Powers SearchByLabel and the search_graph MCP tool surface.
  • code_node_lexical_fts: covers prop_lex_comment + prop_lex_config_keys. Powers LexicalQueryService's doc-comment / config-key search.

Idempotent: existing indexes are dropped before re-create. The enrich pipeline calls this once after BulkLoadNodes / BulkLoadEdges complete, so the indexes always reflect the latest snapshot.

FTS bundled in Kuzu 0.11.3+ (no network install needed — air-gapped safe).

func (*Store) Cypher

func (s *Store) Cypher(query string, args ...map[string]any) ([]map[string]any, error)

Cypher runs a Cypher statement and returns rows as []map[string]any. For DDL or void queries the returned slice may be empty (or contain whatever status row Kuzu emits). If args is supplied the query is prepared and bound; otherwise it is executed directly.

The caller-supplied map is read-only — parameter values are copied through go-kuzu's Execute path.

func (*Store) CypherRows

func (s *Store) CypherRows(query string, args map[string]any, maxRows int) ([]map[string]any, bool, error)

CypherRows runs query, materialises up to maxRows result rows, and reports whether the query produced more rows than the cap. Used by the run_cypher MCP tool which needs to surface a `truncated` flag without inlining `LIMIT N` into the user-supplied query string (the query may already have its own LIMIT — see the McpTools row-cap gotcha in CLAUDE.md).

The mutation gate from Cypher() applies here too: on a read-only store, any blocked-keyword query short-circuits with an error.

func (*Store) FindByID

func (s *Store) FindByID(id string) (*model.CodeNode, error)

FindByID returns the single node with primary key id, or (nil, nil) when no such node exists. Mirrors GraphRepository.findById on the Java side.

func (*Store) FindByKindPaginated

func (s *Store) FindByKindPaginated(kind string, offset, limit int) ([]*model.CodeNode, error)

FindByKindPaginated returns nodes of the given kind ordered by id with SKIP/LIMIT semantics. Mirrors GraphController's /api/kinds/{kind}. offset / limit must be non-negative; negative input is coerced to 0.

func (*Store) FindIncomingNeighbors

func (s *Store) FindIncomingNeighbors(id string) ([]*model.CodeNode, error)

FindIncomingNeighbors returns distinct nodes a where a -[*]-> n.id. Mirrors GraphController's /api/nodes/{id}/neighbors (incoming side). Note: Kuzu 0.7.1's binder drops the rel-pattern scope after `RETURN DISTINCT`, so the ORDER BY must reference the alias (`id`), not `a.id` — the SQL-standard DISTINCT scope behaviour.

func (*Store) FindOutgoingNeighbors

func (s *Store) FindOutgoingNeighbors(id string) ([]*model.CodeNode, error)

FindOutgoingNeighbors returns distinct nodes b where n.id -[*]-> b. Mirrors GraphController's /api/nodes/{id}/neighbors (outgoing side). Same DISTINCT-scope caveat as FindIncomingNeighbors.

func (*Store) IsReadOnly

func (s *Store) IsReadOnly() bool

IsReadOnly reports whether the store rejects mutating Cypher.

func (*Store) LoadAllEdges

func (s *Store) LoadAllEdges() ([]*model.CodeEdge, error)

LoadAllEdges pulls every edge from every rel table, hydrating model.CodeEdge. Determinism: rows come out grouped by EdgeKind in declaration order, then sorted by edge id within each kind. Empty graph returns (nil, nil).

func (*Store) LoadAllNodes

func (s *Store) LoadAllNodes() ([]*model.CodeNode, error)

LoadAllNodes pulls every CodeNode row out of Kuzu in deterministic ID order and hydrates the columns + the JSON `props` blob back into model.CodeNode. Used by the stats command, which currently re-uses the in-memory StatsService.ComputeStats path rather than per-category Cypher aggregations. On large graphs this is materially heavier than the Java side's TopologyService refactor — see the gotcha in CLAUDE.md for the follow-up plan. Empty graph returns (nil, nil).

func (*Store) Lock

func (s *Store) Lock()

Lock acquires the store mutex. Exposed for callers that drive the connection directly (rare — Cypher / BulkLoad / etc. lock internally).

func (*Store) Path

func (s *Store) Path() string

Path returns the directory the store was opened against.

func (*Store) SearchByLabel

func (s *Store) SearchByLabel(q string, limit int) ([]*model.CodeNode, error)

SearchByLabel runs a fulltext search across the label + fqn_lower index. The query is auto-suffixed with '*' to give prefix matching (so 'auth' matches 'AuthService' identifiers). Results are ranked by BM25 score. Falls back to CONTAINS predicate when the FTS index hasn't been built (pre-enrich or enrich aborted before CreateIndexes).

func (*Store) SearchLexical

func (s *Store) SearchLexical(q string, limit int) ([]*model.CodeNode, error)

SearchLexical runs a fulltext search across the prose columns (prop_lex_comment + prop_lex_config_keys). BM25 ranks results. Same CONTAINS fallback as SearchByLabel for pre-enrich graphs.

func (*Store) Unlock

func (s *Store) Unlock()

Unlock releases the store mutex paired with Lock.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL