Documentation
¶
Overview ¶
Package discover walks --src and yields the source files CKV should index. It respects a .ckvignore file (same line-based syntax as .gitignore: comments, empty lines, glob patterns; '/' suffix means directory). Symlinks, oversized files, and detected binaries are always skipped, regardless of ignore rules.
Limitations vs full gitignore semantics:
- No negation ("!pattern" — not supported)
- No "**" globs (we use filepath.Match; doublestar planned)
- Patterns match against the source-relative path AND the basename
These cover the common cases (node_modules/, vendor/, *.log, build/) without pulling in a heavyweight gitignore parser.
Index ¶
Constants ¶
const DefaultMaxBytes = 1 << 20 // 1 MiB
DefaultMaxBytes caps individual file size to avoid OOM on accidental large blobs. 1MiB is generous for source code (largest typical Go file in stdlib is ~150KB).
Variables ¶
var DefaultIgnore = []string{
".git/",
"node_modules/",
"vendor/",
".next/",
"out/",
"dist/",
"build/",
"target/",
".venv/",
"__pycache__/",
}
DefaultIgnore patterns are applied on top of .ckvignore. They are the directories every realistic indexer wants to skip and are listed explicitly so users can see them in `ckv build --json` output.
var DefaultSecretPatterns = []string{
".env",
".env.local",
".env.development",
".env.development.local",
".env.test",
".env.test.local",
".env.staging",
".env.staging.local",
".env.production",
".env.production.local",
"*.pem",
"*.key",
"*.p12",
"*.pfx",
"*.keystore",
"id_rsa",
"id_rsa.*",
"id_ed25519",
"id_ed25519.*",
"id_ecdsa",
"id_ecdsa.*",
"id_dsa",
"id_dsa.*",
"credentials.json",
"service-account*.json",
".npmrc",
".pypirc",
".netrc",
".aws/credentials",
".aws/config",
}
DefaultSecretPatterns matches files that commonly contain credentials, private keys, or other secrets. Matches are excluded from indexing regardless of .ckvignore configuration — embeddings persist in the sqlite-vec store and a leaked secret in an embedding is recoverable only by rotating the credential and rebuilding the entire index. Cheaper to block at discovery time.
Opt-out (testing only): CKV_DISABLE_SECRET_FILTER=1.
Functions ¶
func IsIgnored ¶
IsIgnored is the exported variant of isIgnored, used by other packages (reindex) that need to apply the same ignore semantics to a list of paths rather than a tree walk.
func IsProbablyBinary ¶
IsProbablyBinary is the exported variant of isProbablyBinary, used by reindex to apply the same binary-detection heuristic Walk uses.
func ResolveGoBuildRoots ¶
func ResolveGoBuildRoots(ctx context.Context, srcRoot string, entryPackages []string, opts GoListOptions) (map[string]struct{}, error)
ResolveGoBuildRoots walks the dependency closure of `entryPackages` using `go list -json -deps` and returns the absolute paths of every .go file owned by the reachable packages.
`srcRoot` is the directory `go list` runs in — it must be inside the module that owns the entry packages. The function returns a set (map[string]struct{}) instead of a slice because the walker uses O(1) lookups; the conversion is cheap.
Failures are returned wrapped so the caller can surface "go list failed at <pkg>" instead of a raw subprocess error.
Types ¶
type File ¶
type File struct {
AbsPath string
RelPath string
Size int64
Language string // "go" | "typescript" | "solidity" | "markdown" | "" (unknown)
}
File is the result record. RelPath is forward-slash, repo-relative.
type GoListOptions ¶
type GoListOptions struct {
// IncludeTests adds *_test.go files (TestGoFiles + XTestGoFiles)
// to the returned set. Defaults true — tests are valuable as
// usage examples and live in the same packages as the code under
// test, so including them mirrors how an agent would search.
IncludeTests bool
// SkipStandardLib drops Go's stdlib packages (fmt, os, ...) from
// the result. Defaults true — stdlib sources live outside srcRoot
// anyway, so including them would just inflate the set with paths
// the walker can't reach.
SkipStandardLib bool
}
GoListOptions tunes ResolveGoBuildRoots' behavior.
func DefaultGoListOptions ¶
func DefaultGoListOptions() GoListOptions
DefaultGoListOptions are the sensible defaults: include tests, skip stdlib. Callers can construct a custom GoListOptions when they need to deviate.
type Options ¶
type Options struct {
MaxBytes int64 // size cap; 0 → DefaultMaxBytes
Extra []string // additional ignore patterns from CLI
// GoBuildFiles, when non-nil, restricts the walk's Go-language
// output to absolute paths that appear as keys in the map. Other
// languages (TypeScript, Solidity, etc.) are unaffected — they
// continue through the regular ignore-pattern path. Use this
// to honor `build_roots` from ckv.yaml (resolved upstream via
// ResolveGoBuildRoots). Nil/empty map means "no filter, walk
// every Go file" — the original behavior.
GoBuildFiles map[string]struct{}
// AllowList, when non-nil, is applied to EVERY candidate file
// regardless of language, BEFORE the GoBuildFiles filter and
// before language-specific handling. A file must pass
// AllowList.Allow(relPath) to be included in the results.
// nil means "no allowlist — all files are eligible" (the existing
// default). This implements the --files-from feature: the caller
// loads a JSON include/exclude spec and passes the resulting
// *filterlist.FilterList here.
AllowList *filterlist.FilterList
}
Options control the walk. All fields are optional; the zero value is the documented default.