chisel

package module
v0.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 19, 2026 License: MIT Imports: 2 Imported by: 0

README

chisel

CI Status codecov Go Report Card CodeQL Go Reference License Go Version Release

AST-aware code chunking for semantic search and embeddings. Chisel parses source code into meaningful units—functions, classes, methods—preserving the context that makes code searchable.

From Syntax to Semantics

source := []byte(`
func New(cfg Config) *Handler { ... }

func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) { ... }

type Config struct {
    Timeout time.Duration
    Logger  *slog.Logger
}
`)

chunks, _ := c.Chunk(ctx, chisel.Go, "api.go", source)

for _, chunk := range chunks {
    fmt.Printf("[%s] %s (lines %d-%d)\n", chunk.Kind, chunk.Symbol, chunk.StartLine, chunk.EndLine)
}
// [function] New (lines 2-2)
// [method] Handler.ServeHTTP (lines 4-4)
// [class] Config (lines 6-9)

Every chunk carries its symbol name, kind, line range, and parent context. Methods know their receiver. Nested types know their enclosing scope.

chunk := chunks[1]
// chunk.Symbol    → "Handler.ServeHTTP"
// chunk.Kind      → "method"
// chunk.Context   → ["Handler"]
// chunk.Content   → the full method source
// chunk.StartLine → 4
// chunk.EndLine   → 4

Feed chunks to an embedding model, store in a vector database, and search code by meaning rather than text.

Install

go get github.com/zoobz-io/chisel

Language providers (install only what you need):

go get github.com/zoobz-io/chisel/golang     # Go (stdlib, no deps)
go get github.com/zoobz-io/chisel/markdown   # Markdown (no deps)
go get github.com/zoobz-io/chisel/typescript # TypeScript/JavaScript (tree-sitter)
go get github.com/zoobz-io/chisel/python     # Python (tree-sitter)
go get github.com/zoobz-io/chisel/rust       # Rust (tree-sitter)

Requires Go 1.24+.

Quick Start

package main

import (
    "context"
    "fmt"

    "github.com/zoobz-io/chisel"
    "github.com/zoobz-io/chisel/golang"
    "github.com/zoobz-io/chisel/typescript"
)

func main() {
    // Create a chunker with language providers
    c := chisel.New(
        golang.New(),
        typescript.New(),
        typescript.NewJavaScript(),
    )

    source := []byte(`
package auth

// Authenticate validates user credentials.
func Authenticate(username, password string) (*User, error) {
    // ...
}

// User represents an authenticated user.
type User struct {
    ID    string
    Email string
}
`)

    chunks, err := c.Chunk(context.Background(), chisel.Go, "auth.go", source)
    if err != nil {
        panic(err)
    }

    for _, chunk := range chunks {
        fmt.Printf("[%s] %s\n", chunk.Kind, chunk.Symbol)
        fmt.Printf("  Lines: %d-%d\n", chunk.StartLine, chunk.EndLine)
        if len(chunk.Context) > 0 {
            fmt.Printf("  Context: %v\n", chunk.Context)
        }
    }
}

Output:

[function] Authenticate
  Lines: 4-6
[class] User
  Lines: 8-12

Capabilities

Feature Description Docs
Multi-language Go, TypeScript, JavaScript, Python, Rust, Markdown Providers
Semantic extraction Functions, methods, classes, interfaces, types, enums Concepts
Context preservation Parent chain for nested definitions Architecture
Line mapping Precise source locations for each chunk Types
Zero-copy providers Go and Markdown use stdlib only Architecture

Why Chisel?

  • Semantic boundaries — Chunks split at function/class boundaries, not arbitrary line counts
  • Embedding-ready — Output designed for vector databases and semantic search
  • Isolated dependencies — Tree-sitter only where needed; Go/Markdown have zero external deps
  • Context-aware — Methods know their parent class; nested functions know their scope
  • Consistent interface — Same Provider contract across all languages

Code Intelligence Pipelines

Chisel enables a pattern: parse once, search by meaning.

Your codebase becomes a corpus of semantic units. Each function, method, and type gets embedded with its full context — symbol name, parent scope, documentation. Queries match intent, not just text.

// Chunk your codebase
chunks, _ := c.Chunk(ctx, chisel.Go, path, source)

// Embed each chunk (using your embedding provider)
for _, chunk := range chunks {
    embedding := embedder.Embed(chunk.Content)
    vectorDB.Store(embedding, chunk.Symbol, chunk.Kind, path)
}

// Search by meaning
results := vectorDB.Query("authentication middleware")
// Returns: AuthMiddleware, ValidateToken, SessionHandler
// Not just files containing the word "authentication"

Symbol names and kinds become metadata. Line ranges enable source navigation. Context chains power hierarchical search.

Ecosystem

Chisel provides the chunking layer for code intelligence pipelines:

  • vicky — Code search and retrieval service

Documentation

Contributing

Contributions welcome. See CONTRIBUTING.md for guidelines.

License

MIT — see LICENSE for details.

Documentation

Overview

Package chisel provides AST-aware code chunking for semantic search and embeddings.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Chunk

type Chunk struct {
	// Content is the actual code or text.
	Content string

	// Symbol is the name of the function, class, type, or section.
	Symbol string

	// Kind categorizes this chunk.
	Kind Kind

	// StartLine is the 1-indexed starting line number.
	StartLine int

	// EndLine is the 1-indexed ending line number.
	EndLine int

	// Context is the parent chain for this chunk.
	// Example: ["class UserService", "method getUser"]
	Context []string
}

Chunk represents a semantic unit of code or documentation.

type Chunker

type Chunker struct {
	// contains filtered or unexported fields
}

Chunker routes content to the appropriate language provider.

func New

func New(providers ...Provider) *Chunker

New creates a Chunker with the given providers.

func (*Chunker) Chunk

func (c *Chunker) Chunk(ctx context.Context, lang Language, filename string, content []byte) ([]Chunk, error)

Chunk parses content using the appropriate provider for the language.

func (*Chunker) HasProvider

func (c *Chunker) HasProvider(lang Language) bool

HasProvider returns true if a provider is registered for the language.

func (*Chunker) Languages

func (c *Chunker) Languages() []Language

Languages returns all registered languages.

func (*Chunker) Register

func (c *Chunker) Register(p Provider)

Register adds a provider to the chunker.

type Kind

type Kind string

Kind categorizes a chunk.

const (
	KindFunction  Kind = "function"
	KindMethod    Kind = "method"
	KindClass     Kind = "class"
	KindInterface Kind = "interface"
	KindType      Kind = "type"
	KindEnum      Kind = "enum"
	KindConstant  Kind = "constant"
	KindVariable  Kind = "variable"
	KindSection   Kind = "section" // For markdown headers
	KindModule    Kind = "module"  // Package/file level
)

Chunk kinds.

type Language

type Language string

Language identifies a programming language.

const (
	Go         Language = "go"
	TypeScript Language = "typescript"
	JavaScript Language = "javascript"
	Python     Language = "python"
	Rust       Language = "rust"
	Markdown   Language = "markdown"
)

Supported languages.

type Provider

type Provider interface {
	// Chunk parses content and returns semantic chunks.
	Chunk(ctx context.Context, filename string, content []byte) ([]Chunk, error)

	// Language returns the supported language.
	Language() Language
}

Provider parses a specific language into chunks.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL