concepts

package
v0.0.0-...-cc2395b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 15, 2026 License: MIT Imports: 18 Imported by: 0

Documentation

Overview

Package concepts owns the controlled-vocabulary half of the gm-s47n two-axis planner (see docs/design/work-planning.md §6).

The package is self-contained: it does NOT depend on internal/core's WorkItem schema. Beads are surfaced through a small BeadConceptStore interface so the package compiles and tests run before gm-s47n.1.1 lands the WorkItem.concepts field.

Three interlocking surfaces:

  • Bootstrap runs N pluggable [BootstrapSource]s in parallel, unions the candidates, normalizes, dedupes, and caps. Ships three sources: Go packages, route prefixes, fixture taxonomy.
  • DetectDrift reads bead concept usage and emits suggestions for near-duplicate merges, drifter follow-ups, and singleton deletes. Idempotent, pure, never mutates state.
  • [ReviewQueue] persists suggestions and operator decisions; an approval drives historical rewrites via ApplyMerge / ApplyRename / ApplyDelete over the BeadConceptStore.

Storage lives in <workspace>/.gemba/concepts/.

Index

Constants

View Source
const (
	StoreDirName     = "concepts"
	VocabularyFile   = "vocabulary.json"
	SuggestionsFile  = "suggestions.json"
	DecisionsLogFile = "decisions.log"
)

Store paths inside <workspace>/.gemba/concepts/. The names live as constants so the CLI, the SPA (when it lands), and any future importer all hit the same files.

Variables

View Source
var (
	ErrSuggestionNotFound = errors.New("concepts: suggestion not found")
	ErrSuggestionDecided  = errors.New("concepts: suggestion already decided")
)

ErrSuggestionNotFound / ErrSuggestionDecided are sentinel errors the CLI checks via errors.Is so it can map them to user-facing messages without string parsing.

Functions

func AppendDecision

func AppendDecision(workspace string, d Decision) error

AppendDecision adds one entry to the decisions JSONL log. The log is append-only: every approve / reject becomes a permanent line so the audit trail survives vocabulary edits.

func ApplyDelete

func ApplyDelete(ctx context.Context, store BeadConceptStore, term string) (int, error)

ApplyDelete drops `term` from every bead that has it. Returns the count of beads changed.

func ApplyMerge

func ApplyMerge(ctx context.Context, store BeadConceptStore, from, to string) (int, error)

ApplyMerge rewrites every bead whose concept set contains `from` so it now contains `to` instead. Beads already carrying both terms drop the `from` (no double entry). Returns the count of beads changed.

func ApplyRename

func ApplyRename(ctx context.Context, store BeadConceptStore, from, to string) (int, error)

ApplyRename changes every occurrence of `from` to `to` across every bead. Identical mechanics to ApplyMerge — the difference is in the vocabulary layer (Rename keeps the term as the surviving one, Merge collapses two pre-existing terms). Beads carrying both terms dedup to a single `to`.

func Bootstrap

func Bootstrap(ctx context.Context, root string, sources []BootstrapSource, opts BootstrapOpts) (*Vocabulary, *BootstrapResult, error)

Bootstrap runs every source in parallel, unions the candidates, normalizes, dedupes, and caps. Returns a fresh Vocabulary with stable name order. Source selection is the caller's responsibility; pass DefaultSources for the ship-with set.

Errors from individual sources are collected and surfaced via [BootstrapResult.Errors]; the vocabulary is returned even when some sources failed because the surviving sources still produce useful starter terms.

func EnsureStoreDir

func EnsureStoreDir(workspace string) (string, error)

EnsureStoreDir mkdir-p's the concepts directory. Idempotent.

func NewSuggestionID

func NewSuggestionID() string

NewSuggestionID mints a short hex id stable enough for an operator to type. Collisions inside one workspace are vanishingly unlikely (8 hex chars = 4 bytes of entropy).

func Normalize

func Normalize(name string) string

Normalize collapses a candidate name into the canonical lower-kebab-case form. Whitespace, underscores, slashes, and dots all become hyphens; runs of separators collapse to one; trailing separators are trimmed.

func SaveSuggestions

func SaveSuggestions(workspace string, list *SuggestionList) error

func SaveVocabulary

func SaveVocabulary(workspace string, v *Vocabulary) error

SaveVocabulary writes vocabulary.json atomically (write to a sibling .tmp + rename) so a crashed process never leaves a half- written file.

func StoreDir

func StoreDir(workspace string) string

StoreDir returns the absolute concepts directory under the workspace's .gemba/. Callers usually pass `<workspace>` resolved elsewhere; the helper just composes the path.

Types

type BeadConceptStore

type BeadConceptStore interface {
	// List returns every bead's id and current concept set.
	List(ctx context.Context) ([]BeadConcepts, error)

	// Set replaces the concept set on the named bead. The slice is
	// owned by the caller after return — implementations that need to
	// retain it should copy.
	Set(ctx context.Context, beadID string, concepts []string) error
}

BeadConceptStore is the integration boundary between the concepts package and the WorkItem.concepts schema landing in gm-s47n.1.1. Production wiring (a thin adapter over WorkPlane) is scheduled for that bead; until then the in-memory implementation in this file powers tests and CLI dry-runs.

type BeadConcepts

type BeadConcepts struct {
	BeadID   string
	Concepts []string
	// CreatedAt + ClosedAt feed the singleton-decay heuristic. Both
	// optional; zero values disable the time-window filter.
	CreatedAt time.Time
	ClosedAt  *time.Time
}

BeadConcepts is the slice projection a BeadConceptStore returns for each bead — just the id and the current concept set. Both the drift detector and the historical rewrite consume this shape.

type BootstrapBucket

type BootstrapBucket struct {
	Source string
	Count  int
}

type BootstrapError

type BootstrapError struct {
	Source string
	Err    error
}

func (BootstrapError) Error

func (e BootstrapError) Error() string

type BootstrapOpts

type BootstrapOpts struct {
	// Max caps the number of terms in the resulting vocabulary. The
	// bead description targets 30-60; default is 60. Sources are
	// queried in order so an early source's candidates fill first;
	// callers wanting different priority can reorder the slice.
	Max int
}

BootstrapOpts controls collection limits.

func DefaultBootstrapOpts

func DefaultBootstrapOpts() BootstrapOpts

DefaultBootstrapOpts is the ship-with policy: at most 60 terms.

type BootstrapResult

type BootstrapResult struct {
	Total    int
	Skipped  int               // candidates dropped because Max was hit
	BySource []BootstrapBucket // count of terms attributed per source
	Errors   []BootstrapError  // per-source failures (other sources still ran)
}

BootstrapResult is the operator-visible report of one bootstrap run. Mostly diagnostic; the vocabulary itself is the load-bearing output.

type BootstrapSource

type BootstrapSource interface {
	// Name is a stable identifier the [Term.Source] field carries
	// so a future operator can tell which source proposed a term.
	// Convention: lower-kebab-case noun phrase.
	Name() string

	// Extract returns the candidates this source observed under root.
	// Implementations MUST return [] (not error) when there's nothing
	// to extract — a workspace without an `internal/` directory is a
	// legitimate state, not a failure.
	Extract(ctx context.Context, root string) ([]Candidate, error)
}

BootstrapSource extracts candidate vocabulary terms from one observable feature of the workspace. The interface stays small so adding a source (e.g. an org-internal Linear label exporter) is a one-method change.

func DefaultSources

func DefaultSources() []BootstrapSource

DefaultSources returns the bootstrap sources gemba ships. Order matters — earlier sources fill the cap first when [BootstrapOpts.Max] is small. Operators wanting a different priority compose their own slice.

type Candidate

type Candidate struct {
	Name        string
	Description string
	// Source is overwritten by Bootstrap with the originating
	// BootstrapSource.Name(); implementations can leave it empty.
	Source string
}

Candidate is one proposed vocabulary entry. Bootstrap collects, normalizes, and dedupes by Name; the first source to propose a name wins for the Source label.

type Decision

type Decision struct {
	SuggestionID string         `json:"suggestion_id"`
	Kind         SuggestionKind `json:"kind"`
	From         string         `json:"from,omitempty"`
	To           string         `json:"to,omitempty"`
	Action       string         `json:"action"` // "approved" | "rejected"
	Reason       string         `json:"reason,omitempty"`
	By           string         `json:"by"`
	BeadsChanged int            `json:"beads_changed,omitempty"`
	At           time.Time      `json:"at"`
}

Decision is one append-only entry in the decisions log.

func ApplyDecision

func ApplyDecision(
	ctx context.Context,
	v *Vocabulary,
	list *SuggestionList,
	store BeadConceptStore,
	id string,
	by string,
) (Decision, error)

ApplyDecision is the CLI-facing entry point. It looks up the suggestion, marks it approved on the in-memory list, applies the vocabulary side and the bead-side rewrite, and returns the count of beads changed for the audit log. Caller is responsible for persisting the vocabulary + suggestion list afterwards.

func ReadDecisions

func ReadDecisions(workspace string) ([]Decision, error)

ReadDecisions returns every decision in the log, in order. Used by the CLI's `concepts log` view; the file itself stays appendable for new entries.

func RejectDecision

func RejectDecision(list *SuggestionList, id, by, reason string) (Decision, error)

RejectDecision marks a suggestion rejected and returns the audit-log entry.

type Drift

type Drift struct {
	NearDuplicates []NearDuplicate `json:"near_duplicates,omitempty"`
	Singletons     []Singleton     `json:"singletons,omitempty"`
}

Drift is the detector's report.

func DetectDrift

func DetectDrift(beads []BeadConcepts, opts DriftOpts) Drift

DetectDrift reads bead concepts and returns the current drift state. Pure: same input → same output, no mutation.

Drifters (semantic neighbor walking) live in gm-s47n.3 — the source analysis abstraction is the right place for embedding-based work, not this co-occurrence-only detector. This function ships the two signal types the bead description called out as concrete (.7.2).

type DriftOpts

type DriftOpts struct {
	// NearDuplicateJaccard is the minimum Jaccard similarity a pair
	// of terms must share before the detector flags them as
	// near-duplicates. Default 0.6.
	NearDuplicateJaccard float64

	// NearDuplicateUseRatio guards against flagging a pair where
	// one term is heavily used and the other is a singleton — the
	// usage profiles must be comparable. min(|a|,|b|)/max(|a|,|b|).
	// Default 0.5.
	NearDuplicateUseRatio float64

	// SingletonDormantDays is how long after a bead's ClosedAt a
	// singleton-on-that-bead must wait before the detector emits a
	// delete suggestion. Default 90 (per spec §6.2). Set to 0 to
	// disable the dormant gate (every singleton becomes a suggestion).
	SingletonDormantDays int

	// SingletonMaxUses is the inclusive upper bound on the bead-count
	// for a term to qualify as a singleton candidate. Default 2 — the
	// spec's "fewer than 3 beads". Set to 1 for the strict "exactly
	// one bead" interpretation.
	SingletonMaxUses int

	// Now is the reference time for dormant calculations. Tests
	// inject a fixed time so cases stay deterministic; production
	// leaves it zero (defaults to time.Now().UTC()).
	Now time.Time
}

DriftOpts tunes the detector's thresholds. Defaults match the values documented in docs/design/work-planning.md §6.4.

func DefaultDriftOpts

func DefaultDriftOpts() DriftOpts

DefaultDriftOpts is the policy that ships. Threshold values target the intent of work-planning.md §6.2 (cosine ≥ 0.85 near-dups, singletons "< 3 beads after 90 days") translated to the Jaccard + dormant-only metrics this detector ships:

  • Jaccard 0.7 lands at a similar precision to cosine 0.85 on the small-sparse-set distribution beads produce in practice.
  • Singleton dormant 90d matches the spec's "after 90 days" gate. Use-count < 3 (rather than == 1) is enforced via [SingletonMaxUses].

type FixtureTaxonomySource

type FixtureTaxonomySource struct{}

FixtureTaxonomySource emits the top-level subdirectory names of testing/e2e/specs/. The e2e library has already validated that each tier names a real surface (smoke / chrome / drawers / grid / realtime / etc.); reusing that taxonomy gives the concept set a language operators are already fluent in.

func (FixtureTaxonomySource) Extract

func (FixtureTaxonomySource) Extract(ctx context.Context, root string) ([]Candidate, error)

func (FixtureTaxonomySource) Name

type GoPackagesSource

type GoPackagesSource struct{}

GoPackagesSource walks internal/ + cmd/ under root and emits a candidate per unique Go package name. Internal package names are the most stable signal of "what a contributor calls a thing" — a directory whose package is named `concepts` is observably about concepts whether or not the operator remembered to label it.

func (GoPackagesSource) Extract

func (GoPackagesSource) Extract(ctx context.Context, root string) ([]Candidate, error)

func (GoPackagesSource) Name

func (GoPackagesSource) Name() string

type MemoryStore

type MemoryStore struct {
	// contains filtered or unexported fields
}

MemoryStore is the in-memory BeadConceptStore. Production-grade — the CLI uses it for dry-runs and the test suite uses it everywhere. The historical-rewrite math is the same regardless of which store sits behind the interface.

func NewMemoryStore

func NewMemoryStore() *MemoryStore

NewMemoryStore returns an empty store. Callers seed via Set.

func (*MemoryStore) List

func (s *MemoryStore) List(_ context.Context) ([]BeadConcepts, error)

List implements BeadConceptStore.

func (*MemoryStore) Set

func (s *MemoryStore) Set(_ context.Context, beadID string, concepts []string) error

Set implements BeadConceptStore.

type NearDuplicate

type NearDuplicate struct {
	A       string  `json:"a"`
	B       string  `json:"b"`
	Jaccard float64 `json:"jaccard"`
	UsesA   int     `json:"uses_a"`
	UsesB   int     `json:"uses_b"`
}

NearDuplicate flags a pair of terms whose co-occurrence pattern suggests they're being used interchangeably.

type RoutePrefixesSource

type RoutePrefixesSource struct{}

RoutePrefixesSource extracts the top-level UI route names the SPA already exposes — every `<Route path="/foo" ...>` literal in web/src/App.tsx becomes a candidate. Routes are user-facing surfaces the operator already named, which makes them excellent concept seeds.

func (RoutePrefixesSource) Extract

func (RoutePrefixesSource) Extract(ctx context.Context, root string) ([]Candidate, error)

func (RoutePrefixesSource) Name

func (RoutePrefixesSource) Name() string

type Singleton

type Singleton struct {
	Term       string     `json:"term"`
	BeadID     string     `json:"bead_id"`
	ClosedAt   *time.Time `json:"closed_at,omitempty"`
	DormantFor int        `json:"dormant_days,omitempty"`
}

Singleton flags a term used on exactly one bead. Carries that bead's id + close timestamp so the operator can decide whether the concept ever generalized.

type Suggestion

type Suggestion struct {
	ID        string           `json:"id"`
	Kind      SuggestionKind   `json:"kind"`
	From      string           `json:"from,omitempty"`
	To        string           `json:"to,omitempty"`
	Reason    string           `json:"reason"`
	Source    string           `json:"source"` // "drift:near-duplicate" | "drift:singleton" | "operator"
	CreatedAt time.Time        `json:"created_at"`
	Status    SuggestionStatus `json:"status"`
}

Suggestion is a proposed vocabulary change. The drift detector emits these (status=pending); the operator approves or rejects; the apply path materializes approved changes through the vocabulary + the bead store.

func SuggestionsFromDrift

func SuggestionsFromDrift(d Drift, existing []Suggestion) []Suggestion

SuggestionsFromDrift converts a drift report into pending suggestions. Idempotent against the existing list — a near- duplicate that's already in the queue (same Kind + From + To, regardless of order) doesn't get a second entry.

type SuggestionKind

type SuggestionKind string

SuggestionKind enumerates the closed set of changes the queue can surface. Add a new kind here, in ApplyDecision, and add the corresponding Vocabulary / BeadConceptStore handler.

const (
	KindMerge  SuggestionKind = "merge"
	KindRename SuggestionKind = "rename"
	KindDelete SuggestionKind = "delete"
)

type SuggestionList

type SuggestionList struct {
	Suggestions []Suggestion `json:"suggestions"`
}

LoadSuggestions / SaveSuggestions mirror the vocabulary helpers.

func LoadSuggestions

func LoadSuggestions(workspace string) (*SuggestionList, error)

func (*SuggestionList) Add

func (l *SuggestionList) Add(s Suggestion) bool

Add appends a suggestion. No-op when the (kind, from, to) tuple is already pending or approved — rejected suggestions don't block a re-proposal because the operator's earlier "no" was about that instance, not the entire idea.

func (*SuggestionList) Approved

func (l *SuggestionList) Approved() []Suggestion

Approved / Rejected accessors mirror Pending.

func (*SuggestionList) Find

func (l *SuggestionList) Find(id string) (*Suggestion, bool)

Find returns the suggestion with the given id (and a found bool).

func (*SuggestionList) Mark

func (l *SuggestionList) Mark(id string, status SuggestionStatus) error

Mark updates a suggestion's status. Returns ErrSuggestionNotFound when the id doesn't match. Only pending suggestions can transition; re-marking a decided suggestion is an error so an operator can't silently flip a historical decision.

func (*SuggestionList) Pending

func (l *SuggestionList) Pending() []Suggestion

Pending returns the slice of pending suggestions in stable order (by Kind, then From, then To). Callers wanting all statuses iterate the SuggestionList directly.

func (*SuggestionList) Rejected

func (l *SuggestionList) Rejected() []Suggestion

type SuggestionStatus

type SuggestionStatus string

SuggestionStatus tracks the operator's decision lifecycle.

const (
	StatusPending  SuggestionStatus = "pending"
	StatusApproved SuggestionStatus = "approved"
	StatusRejected SuggestionStatus = "rejected"
)

type Term

type Term struct {
	Name        string `json:"name"`
	Source      string `json:"source"`
	Description string `json:"description,omitempty"`
	// Aliases are names that merged into this term. Kept on the
	// surviving term so lookups for the retired name still resolve
	// without walking the suggestions log.
	Aliases   []string  `json:"aliases,omitempty"`
	CreatedAt time.Time `json:"created_at"`
	UpdatedAt time.Time `json:"updated_at"`
	// Retired terms stay in the vocabulary so historical rewrites
	// can find them; lookups for active terms filter via [Vocabulary.Active].
	Retired   bool       `json:"retired,omitempty"`
	RetiredAt *time.Time `json:"retired_at,omitempty"`
}

Term is one entry in the controlled vocabulary. Names are normalized lower-kebab-case so a bead carrying "React-Query" matches a vocabulary term "react-query".

type Vocabulary

type Vocabulary struct {
	Terms []Term `json:"terms"`
}

Vocabulary is the closed set of terms, stably ordered by Name. The in-memory shape mirrors the on-disk vocabulary.json so the file stays diff-friendly for review in beads dolt commits.

func LoadVocabulary

func LoadVocabulary(workspace string) (*Vocabulary, error)

LoadVocabulary reads vocabulary.json. Returns an empty Vocabulary (not an error) when the file doesn't exist — a fresh workspace's first read is "no terms yet", which the bootstrap path handles.

func (*Vocabulary) Active

func (v *Vocabulary) Active() []Term

Active returns just the non-retired terms, copied so callers can't mutate vocabulary state.

func (*Vocabulary) Add

func (v *Vocabulary) Add(t Term) (*Term, bool)

Add inserts a term, returning the inserted term and a bool that reports whether it was new. Re-adding an existing name is a no-op that returns the existing term — bootstrap sources can run multiple times without piling up duplicates.

func (*Vocabulary) Find

func (v *Vocabulary) Find(name string) (*Term, bool)

Find returns the term with the given canonical name (or any alias), and a bool reporting whether it was found. Includes retired terms — historical rewrites need them.

func (*Vocabulary) Merge

func (v *Vocabulary) Merge(from, to string) (*Term, error)

Merge folds the `from` term into `to`: from's name is added as an alias on to, from is retired. Both terms must already exist in the vocabulary. Returns the surviving term and an error when either name is missing.

Merge is the vocabulary-level half of the rewrite pipeline; the historical bead rewrite is ApplyMerge over a BeadConceptStore.

func (*Vocabulary) Rename

func (v *Vocabulary) Rename(from, to string) (*Term, error)

Rename swaps a term's canonical name in place. The old name is preserved as an alias so beads carrying the old name still resolve.

func (*Vocabulary) Retire

func (v *Vocabulary) Retire(name string) bool

Retire marks the named term as retired and stamps RetiredAt. No-op when the term is already retired; returns false when the name matches no term at all.

func (*Vocabulary) Sort

func (v *Vocabulary) Sort()

Sort sorts the vocabulary's terms by name in place. Storage layer calls this before serializing so the on-disk order is stable.

type VocabularyError

type VocabularyError struct {
	Reason string
	Term   string
}

VocabularyError is the typed error returned by mutator methods so callers can branch on Reason without string parsing.

func (*VocabularyError) Error

func (e *VocabularyError) Error() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL