discover

package
v0.0.0-...-78728ec Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 16, 2026 License: AGPL-3.0 Imports: 12 Imported by: 0

Documentation

Overview

Package discover walks --src and yields the source files CKV should index. It respects a .ckvignore file (same line-based syntax as .gitignore: comments, empty lines, glob patterns; '/' suffix means directory). Symlinks, oversized files, and detected binaries are always skipped, regardless of ignore rules.

Limitations vs full gitignore semantics:

  • No negation ("!pattern" — not supported)
  • No "**" globs (we use filepath.Match; doublestar planned)
  • Patterns match against the source-relative path AND the basename

These cover the common cases (node_modules/, vendor/, *.log, build/) without pulling in a heavyweight gitignore parser.

Index

Constants

View Source
const DefaultMaxBytes = 1 << 20 // 1 MiB

DefaultMaxBytes caps individual file size to avoid OOM on accidental large blobs. 1MiB is generous for source code (largest typical Go file in stdlib is ~150KB).

Variables

View Source
var DefaultIgnore = []string{
	".git/",
	"node_modules/",
	"vendor/",
	".next/",
	"out/",
	"dist/",
	"build/",
	"target/",
	".venv/",
	"__pycache__/",
}

DefaultIgnore patterns are applied on top of .ckvignore. They are the directories every realistic indexer wants to skip and are listed explicitly so users can see them in `ckv build --json` output.

View Source
var DefaultSecretPatterns = []string{
	".env",
	".env.local",
	".env.development",
	".env.development.local",
	".env.test",
	".env.test.local",
	".env.staging",
	".env.staging.local",
	".env.production",
	".env.production.local",
	"*.pem",
	"*.key",
	"*.p12",
	"*.pfx",
	"*.keystore",
	"id_rsa",
	"id_rsa.*",
	"id_ed25519",
	"id_ed25519.*",
	"id_ecdsa",
	"id_ecdsa.*",
	"id_dsa",
	"id_dsa.*",
	"credentials.json",
	"service-account*.json",
	".npmrc",
	".pypirc",
	".netrc",
	".aws/credentials",
	".aws/config",
}

DefaultSecretPatterns matches files that commonly contain credentials, private keys, or other secrets. Matches are excluded from indexing regardless of .ckvignore configuration — embeddings persist in the sqlite-vec store and a leaked secret in an embedding is recoverable only by rotating the credential and rebuilding the entire index. Cheaper to block at discovery time.

Opt-out (testing only): CKV_DISABLE_SECRET_FILTER=1.

Functions

func IsIgnored

func IsIgnored(rel string, patterns []string) bool

IsIgnored is the exported variant of isIgnored, used by other packages (reindex) that need to apply the same ignore semantics to a list of paths rather than a tree walk.

func IsProbablyBinary

func IsProbablyBinary(path string) bool

IsProbablyBinary is the exported variant of isProbablyBinary, used by reindex to apply the same binary-detection heuristic Walk uses.

func ResolveGoBuildRoots

func ResolveGoBuildRoots(ctx context.Context, srcRoot string, entryPackages []string, opts GoListOptions) (map[string]struct{}, error)

ResolveGoBuildRoots walks the dependency closure of `entryPackages` using `go list -json -deps` and returns the absolute paths of every .go file owned by the reachable packages.

`srcRoot` is the directory `go list` runs in — it must be inside the module that owns the entry packages. The function returns a set (map[string]struct{}) instead of a slice because the walker uses O(1) lookups; the conversion is cheap.

Failures are returned wrapped so the caller can surface "go list failed at <pkg>" instead of a raw subprocess error.

Types

type File

type File struct {
	AbsPath  string
	RelPath  string
	Size     int64
	Language string // "go" | "typescript" | "solidity" | "markdown" | "" (unknown)
}

File is the result record. RelPath is forward-slash, repo-relative.

func Walk

func Walk(srcRoot string, opts Options) (files []File, errs []error, err error)

Walk scans srcRoot and returns the list of files CKV should process. Errors during walk are logged into errs (one per file) so a single bad file doesn't abort the whole indexing pass.

type GoListOptions

type GoListOptions struct {
	// IncludeTests adds *_test.go files (TestGoFiles + XTestGoFiles)
	// to the returned set. Defaults true — tests are valuable as
	// usage examples and live in the same packages as the code under
	// test, so including them mirrors how an agent would search.
	IncludeTests bool

	// SkipStandardLib drops Go's stdlib packages (fmt, os, ...) from
	// the result. Defaults true — stdlib sources live outside srcRoot
	// anyway, so including them would just inflate the set with paths
	// the walker can't reach.
	SkipStandardLib bool
}

GoListOptions tunes ResolveGoBuildRoots' behavior.

func DefaultGoListOptions

func DefaultGoListOptions() GoListOptions

DefaultGoListOptions are the sensible defaults: include tests, skip stdlib. Callers can construct a custom GoListOptions when they need to deviate.

type Options

type Options struct {
	MaxBytes int64    // size cap; 0 → DefaultMaxBytes
	Extra    []string // additional ignore patterns from CLI

	// GoBuildFiles, when non-nil, restricts the walk's Go-language
	// output to absolute paths that appear as keys in the map. Other
	// languages (TypeScript, Solidity, etc.) are unaffected — they
	// continue through the regular ignore-pattern path. Use this
	// to honor `build_roots` from ckv.yaml (resolved upstream via
	// ResolveGoBuildRoots). Nil/empty map means "no filter, walk
	// every Go file" — the original behavior.
	GoBuildFiles map[string]struct{}

	// AllowList, when non-nil, is applied to EVERY candidate file
	// regardless of language, BEFORE the GoBuildFiles filter and
	// before language-specific handling. A file must pass
	// AllowList.Allow(relPath) to be included in the results.
	// nil means "no allowlist — all files are eligible" (the existing
	// default). This implements the --files-from feature: the caller
	// loads a JSON include/exclude spec and passes the resulting
	// *filterlist.FilterList here.
	AllowList *filterlist.FilterList
}

Options control the walk. All fields are optional; the zero value is the documented default.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL