knowledgebase

package module
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 25, 2026 License: MIT Imports: 12 Imported by: 0

README

togo

togo-framework/knowledge-base

marketplace pkg.go.dev MIT

A managed knowledge base for togo — ingest, chunk, embed & hybrid-search content for RAG.

Install

togo install togo-framework/knowledge-base

Turns the thin ai-firecrawl/ai-crawlee/ai-searxng data-source drivers into a real knowledge base: ingest documents, chunk + embed them, run hybrid search (keyword + vector, fused by reciprocal-rank-fusion), and crawl sources on a schedule with content-hash change detection. Collections keep tenants/topics isolated.

Usage

kb, _ := knowledgebase.FromKernel(k)

// Ingest — chunks + embeds; returns (doc, changed). Re-ingesting identical
// content is a no-op (dedupe / change detection).
kb.Ingest(ctx, knowledgebase.Document{
    URL: "/docs/intro", Title: "Intro", Text: "...", Collection: "docs",
})

// Hybrid search (keyword + vector, RRF-merged).
hits := kb.Search(knowledgebase.Query{Text: "how do I install the cli", TopK: 5, Collection: "docs"})
for _, h := range hits {
    fmt.Println(h.Score, h.Title, h.Snippet)
}

Scheduled crawl + change detection

kb.AddSource(
    knowledgebase.Source{Name: "blog", URL: "https://site.com/blog", Collection: "blog", Cron: "@daily"},
    func(ctx context.Context, url string) (knowledgebase.Document, error) {
        // wire ai-firecrawl / ai-crawlee here
        return fetchMarkdown(ctx, url)
    },
)
changed, _ := kb.Crawl(ctx, "blog") // ingests only if content changed

Pair with the scheduler plugin to run kb.Crawl on each source's cron.

Embeddings

Ships a deterministic local embedder (hashing/bag-of-words) so search works offline and tests are reproducible. Swap a real one for semantic quality:

kb.WithEmbedder(myAIEmbedder) // e.g. backed by the ai plugin's Embed

REST API

Method Path Description
POST /api/kb/ingest ingest a {url,title,text,collection} document
GET /api/kb/search?q=&collection= hybrid search
GET /api/kb/documents?collection= list documents
GET /api/kb/sources list crawl sources

Configuration

No required env. The store is a bounded in-memory index (swap a DB/vector store via the seam). For a persistent pgvector + BM25 backend, see rag-postgres.


Premium sponsors

ID8 Media  ·  One Studio

Support togo — become a sponsor.

Documentation

Overview

Package knowledgebase is a managed knowledge base for togo: ingest documents, chunk + embed them, and run hybrid (keyword + vector) search for RAG. It can also crawl sources on a schedule with content-hash change detection.

kb, _ := knowledgebase.FromKernel(k)
kb.Ingest(ctx, knowledgebase.Document{URL: "/docs/intro", Title: "Intro", Text: "..."})
hits := kb.Search(knowledgebase.Query{Text: "how do I install", TopK: 5})

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Document

type Document struct {
	ID         string    `json:"id"`
	URL        string    `json:"url"`
	Title      string    `json:"title"`
	Text       string    `json:"text"`
	Source     string    `json:"source,omitempty"`
	Collection string    `json:"collection,omitempty"`
	Hash       string    `json:"hash"`
	UpdatedAt  time.Time `json:"updated_at"`
}

Document is a unit of source content.

type Embedder

type Embedder interface {
	Embed(text string) []float64
}

Embedder turns text into a vector. The default is a deterministic local hashing embedder (works offline / in tests); wire a real one (e.g. via the ai plugin) with Service.WithEmbedder for semantic quality.

type Fetcher

type Fetcher func(ctx context.Context, url string) (Document, error)

Fetcher retrieves a document for a source URL (wire an ai data-source plugin).

type Query

type Query struct {
	Text       string
	TopK       int
	Collection string
}

Query parameters for Search.

type Result

type Result struct {
	DocID   string  `json:"doc_id"`
	URL     string  `json:"url"`
	Title   string  `json:"title"`
	Snippet string  `json:"snippet"`
	Score   float64 `json:"score"`
}

Result is a ranked search hit.

type Service

type Service struct {
	// contains filtered or unexported fields
}

Service is the knowledge-base runtime (k.Get("knowledge-base")).

func FromKernel

func FromKernel(k *togo.Kernel) (*Service, bool)

FromKernel returns the knowledge-base Service.

func (*Service) AddSource

func (s *Service) AddSource(src Source, fetch Fetcher)

AddSource registers a crawlable source with a fetcher.

func (*Service) Crawl

func (s *Service) Crawl(ctx context.Context, name string) (bool, error)

Crawl fetches a source and ingests it; returns whether content changed.

func (*Service) Documents

func (s *Service) Documents(collection string) []*Document

Documents lists stored documents (optionally filtered by collection).

func (*Service) Ingest

func (s *Service) Ingest(ctx context.Context, d Document) (*Document, bool)

Ingest stores/updates a document: it chunks + embeds the text. It returns (doc, true) when content changed and was (re)ingested, or (doc, false) when the content hash is unchanged (change detection / dedupe).

func (*Service) Search

func (s *Service) Search(q Query) []Result

Search runs hybrid keyword + vector retrieval merged by reciprocal-rank fusion.

func (*Service) Sources

func (s *Service) Sources() []*Source

Sources lists registered crawl sources.

func (*Service) WithEmbedder

func (s *Service) WithEmbedder(e Embedder) *Service

WithEmbedder swaps the embedder (e.g. an ai-backed one).

type Source

type Source struct {
	Name       string `json:"name"`
	URL        string `json:"url"`
	Collection string `json:"collection,omitempty"`
	Cron       string `json:"cron,omitempty"`
	// contains filtered or unexported fields
}

Source is a crawlable origin with optional change detection.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL