chisel

AST-aware code chunking for semantic search and embeddings. Chisel parses source code into meaningful units—functions, classes, methods—preserving the context that makes code searchable.
From Syntax to Semantics
source := []byte(`
func New(cfg Config) *Handler { ... }
func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) { ... }
type Config struct {
Timeout time.Duration
Logger *slog.Logger
}
`)
chunks, _ := c.Chunk(ctx, chisel.Go, "api.go", source)
for _, chunk := range chunks {
fmt.Printf("[%s] %s (lines %d-%d)\n", chunk.Kind, chunk.Symbol, chunk.StartLine, chunk.EndLine)
}
// [function] New (lines 2-2)
// [method] Handler.ServeHTTP (lines 4-4)
// [class] Config (lines 6-9)
Every chunk carries its symbol name, kind, line range, and parent context. Methods know their receiver. Nested types know their enclosing scope.
chunk := chunks[1]
// chunk.Symbol → "Handler.ServeHTTP"
// chunk.Kind → "method"
// chunk.Context → ["Handler"]
// chunk.Content → the full method source
// chunk.StartLine → 4
// chunk.EndLine → 4
Feed chunks to an embedding model, store in a vector database, and search code by meaning rather than text.
Install
go get github.com/zoobz-io/chisel
Language providers (install only what you need):
go get github.com/zoobz-io/chisel/golang # Go (stdlib, no deps)
go get github.com/zoobz-io/chisel/markdown # Markdown (no deps)
go get github.com/zoobz-io/chisel/typescript # TypeScript/JavaScript (tree-sitter)
go get github.com/zoobz-io/chisel/python # Python (tree-sitter)
go get github.com/zoobz-io/chisel/rust # Rust (tree-sitter)
Requires Go 1.24+.
Quick Start
package main
import (
"context"
"fmt"
"github.com/zoobz-io/chisel"
"github.com/zoobz-io/chisel/golang"
"github.com/zoobz-io/chisel/typescript"
)
func main() {
// Create a chunker with language providers
c := chisel.New(
golang.New(),
typescript.New(),
typescript.NewJavaScript(),
)
source := []byte(`
package auth
// Authenticate validates user credentials.
func Authenticate(username, password string) (*User, error) {
// ...
}
// User represents an authenticated user.
type User struct {
ID string
Email string
}
`)
chunks, err := c.Chunk(context.Background(), chisel.Go, "auth.go", source)
if err != nil {
panic(err)
}
for _, chunk := range chunks {
fmt.Printf("[%s] %s\n", chunk.Kind, chunk.Symbol)
fmt.Printf(" Lines: %d-%d\n", chunk.StartLine, chunk.EndLine)
if len(chunk.Context) > 0 {
fmt.Printf(" Context: %v\n", chunk.Context)
}
}
}
Output:
[function] Authenticate
Lines: 4-6
[class] User
Lines: 8-12
Capabilities
| Feature |
Description |
Docs |
| Multi-language |
Go, TypeScript, JavaScript, Python, Rust, Markdown |
Providers |
| Semantic extraction |
Functions, methods, classes, interfaces, types, enums |
Concepts |
| Context preservation |
Parent chain for nested definitions |
Architecture |
| Line mapping |
Precise source locations for each chunk |
Types |
| Zero-copy providers |
Go and Markdown use stdlib only |
Architecture |
Why Chisel?
- Semantic boundaries — Chunks split at function/class boundaries, not arbitrary line counts
- Embedding-ready — Output designed for vector databases and semantic search
- Isolated dependencies — Tree-sitter only where needed; Go/Markdown have zero external deps
- Context-aware — Methods know their parent class; nested functions know their scope
- Consistent interface — Same
Provider contract across all languages
Code Intelligence Pipelines
Chisel enables a pattern: parse once, search by meaning.
Your codebase becomes a corpus of semantic units. Each function, method, and type gets embedded with its full context — symbol name, parent scope, documentation. Queries match intent, not just text.
// Chunk your codebase
chunks, _ := c.Chunk(ctx, chisel.Go, path, source)
// Embed each chunk (using your embedding provider)
for _, chunk := range chunks {
embedding := embedder.Embed(chunk.Content)
vectorDB.Store(embedding, chunk.Symbol, chunk.Kind, path)
}
// Search by meaning
results := vectorDB.Query("authentication middleware")
// Returns: AuthMiddleware, ValidateToken, SessionHandler
// Not just files containing the word "authentication"
Symbol names and kinds become metadata. Line ranges enable source navigation. Context chains power hierarchical search.
Ecosystem
Chisel provides the chunking layer for code intelligence pipelines:
- vicky — Code search and retrieval service
Documentation
- Learn
- Guides
- Reference
- API — Function signatures
- Types — Type definitions
Contributing
Contributions welcome. See CONTRIBUTING.md for guidelines.
License
MIT — see LICENSE for details.