index

package

v0.0.0-...-7ffb936 Latest Latest Go to latest Published: Jun 27, 2025 License: Apache-2.0 Imports: 50 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/sourcegraph/zoekt

Links

Open Source Insights

Documentation ¶

Rendered for

Overview ¶

Package index contains logic for building Zoekt indexes. NOTE: this package is not considered part of the public API, and it is not recommended to rely on it in external code.

Index ¶

Constants
Variables
func BranchNamesEqual(a, b []zoekt.RepositoryBranch) bool
func DetermineFileCategory(doc *Document)
func DetermineLanguageIfUnknown(doc *Document)
func Explode(dstDir string, inputShard string) error
func HostnameBestEffort() string
func IndexFilePaths(p string) ([]string, error)
func JsonMarshalRepoMetaTemp(shardPath string, repositoryMetadata any) (tempPath, finalPath string, err error)
func Merge(dstDir string, files ...IndexFile) (tmpName, dstName string, _ error)
func NewSearcher(r IndexFile) (zoekt.Searcher, error)
func ParseTemplate(text string) (*template.Template, error)
func PrintNgramStats(r IndexFile) error
func ReadMetadata(inf IndexFile) ([]*zoekt.Repository, *zoekt.IndexMetadata, error)
func ReadMetadataPath(p string) ([]*zoekt.Repository, *zoekt.IndexMetadata, error)
func ReadMetadataPathAlive(p string) ([]*zoekt.Repository, *zoekt.IndexMetadata, error)
func SetTombstone(shardPath string, repoID uint32) error
func SortAndTruncateFiles(files []zoekt.FileMatch, opts *zoekt.SearchOptions) []zoekt.FileMatch
func SortFiles(ms []zoekt.FileMatch)
func UnsetTombstone(shardPath string, repoID uint32) error
type Branch
type Builder
- func NewBuilder(opts Options) (*Builder, error)
- func (b *Builder) Add(doc Document) error
- func (b *Builder) AddFile(name string, content []byte) error
- func (b *Builder) CheckMemoryUsage()
- func (b *Builder) Finish() error
- func (b *Builder) MarkFileAsChangedOrRemoved(path string)
type DisplayTruncator
- func NewDisplayTruncator(opts *zoekt.SearchOptions) (_ DisplayTruncator, hasLimits bool)
type DocChecker
- func (t *DocChecker) Check(content []byte, maxTrigramCount int, allowLargeFile bool) SkipReason
type Document
type DocumentSection
type FileCategory
type HashOptions
type IndexFile
- func NewIndexFile(f *os.File) (IndexFile, error)
type IndexState
type Options
- func (o *Options) Args() []string
- func (o *Options) FindAllShards() []string
- func (o *Options) FindRepositoryMetadata() (repository *zoekt.Repository, metadata *zoekt.IndexMetadata, ok bool, ...)
- func (o *Options) Flags(fs *flag.FlagSet)
- func (o *Options) GetHash() string
- func (o *Options) HashOptions() HashOptions
- func (o *Options) IgnoreSizeMax(name string) bool
- func (o *Options) IncrementalSkipIndexing() bool
- func (o *Options) IndexState() (IndexState, string)
- func (o *Options) SetDefaults()
type ShardBuilder
- func NewShardBuilder(r *zoekt.Repository) (*ShardBuilder, error)
- func (b *ShardBuilder) Add(doc Document) error
- func (b *ShardBuilder) AddFile(name string, content []byte) error
- func (b *ShardBuilder) ContentSize() uint32
- func (b *ShardBuilder) NumFiles() int
- func (b *ShardBuilder) Write(out io.Writer) error
type SkipReason

Constants ¶

View Source

const FeatureVersion = 12

FeatureVersion is increased if a feature is added that requires reindexing data without changing the format version 2: Rank field for shards. 3: Rank documents within shards 4: Dedup file bugfix 5: Remove max line size limit 6: Include '#' into the LineFragment template 7: Record skip reasons in the index. 8: Record source path in the index. 9: Store ctags metadata & bump default max file size 10: Compound shards; more flexible TOC format. 11: Bloom filters for file names & contents 12: go-enry for identifying file languages

View Source

const IndexFormatVersion = 16

IndexFormatVersion is a version number. It is increased every time the on-disk index format is changed. 5: subrepositories. 6: remove size prefix for posting varint list. 7: move subrepos into Repository struct. 8: move repoMetaData out of indexMetadata 9: use bigendian uint64 for trigrams. 10: sections for rune offsets. 11: file ends in rune offsets. 12: 64-bit branchmasks. 13: content checksums 14: languages 15: rune based symbol sections 16: ctags metadata

View Source

const NextIndexFormatVersion = 17

17: compound shard (multi repo)

View Source

const ReadMinFeatureVersion = 8

ReadMinFeatureVersion constrains backwards compatibility by refusing to load a file with a FeatureVersion below it.

View Source

const (
	ScoreOffset = 10_000_000
)

View Source

const WriteMinFeatureVersion = 10

WriteMinFeatureVersion constrains forwards compatibility by emitting files that won't load in zoekt with a FeatureVersion below it.

Variables ¶

View Source

var DefaultDir = filepath.Join(os.Getenv("HOME"), ".zoekt")

View Source

var Version string

Filled by the linker

Functions ¶

func BranchNamesEqual ¶

func BranchNamesEqual(a, b []zoekt.RepositoryBranch) bool

BranchNamesEqual compares the given zoekt.RepositoryBranch slices, and returns true iff both slices specify the same set of branch names in the same order.

func DetermineFileCategory ¶

func DetermineFileCategory(doc *Document)

func DetermineLanguageIfUnknown ¶

func DetermineLanguageIfUnknown(doc *Document)

func Explode ¶

func Explode(dstDir string, inputShard string) error

Explode takes an input shard and creates 1 simple shard per repository. It is a wrapper around explode that takes care of removing the input shard and renaming the temporary shards.

func HostnameBestEffort ¶

func HostnameBestEffort() string

func IndexFilePaths ¶

func IndexFilePaths(p string) ([]string, error)

IndexFilePaths returns all paths for the IndexFile at filepath p that exist. Note: if no files exist this will return an empty slice and nil error.

This is p and the ".meta" file for p.

func JsonMarshalRepoMetaTemp ¶

func JsonMarshalRepoMetaTemp(shardPath string, repositoryMetadata any) (tempPath, finalPath string, err error)

JsonMarshalRepoMetaTemp writes the json encoding of the given repository metadata to a temporary file in the same directory as the given shard path. It returns both the path of the temporary file and the path of the final file that the caller should use.

The caller is responsible for renaming the temporary file to the final file path, or removing the temporary file if it is no longer needed. TODO: Should we stick this in a util package?

func Merge ¶

func Merge(dstDir string, files ...IndexFile) (tmpName, dstName string, _ error)

Merge files into a compound shard in dstDir. Merge returns tmpName and a dstName. It is the responsibility of the caller to delete the input shards and rename the temporary compound shard from tmpName to dstName.

func NewSearcher ¶

func NewSearcher(r IndexFile) (zoekt.Searcher, error)

NewSearcher creates a Searcher for a single index file. Search results coming from this searcher are valid only for the lifetime of the Searcher itself, ie. []byte members should be copied into fresh buffers if the result is to survive closing the shard.

func ParseTemplate ¶

func ParseTemplate(text string) (*template.Template, error)

ParseTemplate will parse the templates for FileURLTemplate, LineFragmentTemplate and CommitURLTemplate.

It makes available the extra function UrlJoinPath.

func PrintNgramStats ¶

func PrintNgramStats(r IndexFile) error

PrintNgramStats outputs a list of the form

n_1 trigram_1
n_2 trigram_2
...

where n_i is the length of the postings list of trigram_i stored in r.

func ReadMetadata ¶

func ReadMetadata(inf IndexFile) ([]*zoekt.Repository, *zoekt.IndexMetadata, error)

ReadMetadata returns the metadata of index shard without reading the index data. The IndexFile is not closed.

func ReadMetadataPath ¶

func ReadMetadataPath(p string) ([]*zoekt.Repository, *zoekt.IndexMetadata, error)

ReadMetadataPath returns the metadata of index shard at p without reading the index data. ReadMetadataPath is a helper for ReadMetadata which opens the IndexFile at p.

func ReadMetadataPathAlive ¶

func ReadMetadataPathAlive(p string) ([]*zoekt.Repository, *zoekt.IndexMetadata, error)

ReadMetadataPathAlive is like ReadMetadataPath except that it only returns alive repositories.

func SetTombstone ¶

func SetTombstone(shardPath string, repoID uint32) error

SetTombstone idempotently sets a tombstone for repoName in .meta.

func SortAndTruncateFiles ¶

func SortAndTruncateFiles(files []zoekt.FileMatch, opts *zoekt.SearchOptions) []zoekt.FileMatch

SortAndTruncateFiles is a convenience around SortFiles and DisplayTruncator. Given an aggregated files it will sort and then truncate based on the search options.

func SortFiles ¶

func SortFiles(ms []zoekt.FileMatch)

SortFiles sorts files matches in the order we want to present results to users. The order depends on the match score, which includes both query-dependent signals like word overlap, and file-only signals like the file ranks (if file ranks are enabled).

We don't only use the scores, we will also boost some results to present files with novel extensions.

func UnsetTombstone ¶

func UnsetTombstone(shardPath string, repoID uint32) error

UnsetTombstone idempotently removes a tombstones for reopName in .meta.

Types ¶

type Branch ¶

type Branch struct {
	Name    string
	Version string
}

Branch describes a single branch version.

type Builder ¶

type Builder struct {
	// contains filtered or unexported fields
}

Builder manages (parallel) creation of uniformly sized shards. The builder buffers up documents until it collects enough documents and then builds a shard and writes.

func NewBuilder ¶

func NewBuilder(opts Options) (*Builder, error)

NewBuilder creates a new Builder instance.

func (*Builder) Add ¶

func (b *Builder) Add(doc Document) error

func (*Builder) AddFile ¶

func (b *Builder) AddFile(name string, content []byte) error

AddFile is a convenience wrapper for the Add method

func (*Builder) CheckMemoryUsage ¶

func (b *Builder) CheckMemoryUsage()

CheckMemoryUsage checks the memory usage of the process and writes a memory profile if the heap usage exceeds the configured threshold. NOTE: this method is expensive and should only be used for debugging.

func (*Builder) Finish ¶

func (b *Builder) Finish() error

Finish creates a last shard from the buffered documents, and clears stale shards from previous runs. This should always be called, also in failure cases, to ensure cleanup.

It is safe to call Finish() multiple times.

func (*Builder) MarkFileAsChangedOrRemoved ¶

func (b *Builder) MarkFileAsChangedOrRemoved(path string)

MarkFileAsChangedOrRemoved indicates that the file specified by the given path has been changed or removed since the last indexing job for this repository.

If this build is a delta build, these files will be tombstoned in the older shards for this repository.

type DisplayTruncator ¶

type DisplayTruncator func(before []zoekt.FileMatch) (after []zoekt.FileMatch, hasMore bool)

DisplayTruncator is a stateful function which enforces Document and Match display limits by truncating and mutating before. hasMore is true until the limits are exhausted. Once hasMore is false each subsequent call will return an empty after and hasMore false.

func NewDisplayTruncator ¶

func NewDisplayTruncator(opts *zoekt.SearchOptions) (_ DisplayTruncator, hasLimits bool)

NewDisplayTruncator will return a DisplayTruncator which enforces the limits in opts. If there are no limits to enforce, hasLimits is false and there is no need to call DisplayTruncator.

type DocChecker ¶

type DocChecker struct {
	// contains filtered or unexported fields
}

func (*DocChecker) Check ¶

func (t *DocChecker) Check(content []byte, maxTrigramCount int, allowLargeFile bool) SkipReason

Check returns a reason why the given contents are probably not source texts.

type Document ¶

type Document struct {
	Name              string
	Content           []byte
	Branches          []string
	SubRepositoryPath string
	Language          string
	Category          FileCategory

	SkipReason SkipReason

	// Document sections for symbols. Offsets should use bytes.
	Symbols         []DocumentSection
	SymbolsMetaData []*zoekt.Symbol
}

Document holds a document (file) to index.

type DocumentSection ¶

type DocumentSection struct {
	Start, End uint32
}

type FileCategory ¶

type FileCategory byte

FileCategory represents the category of a file, as determined by go-enry. It is non-exhaustive but tries to the major cases like whether the file is a test, generated, etc.

A file's category is used in search scoring to determine the weight of a file match.

const (
	// FileCategoryMissing is a sentinel value that indicates we never computed the file category during indexing
	// (which means we're reading from an old index version). This value can never be written to the index.
	FileCategoryMissing FileCategory = iota
	FileCategoryDefault
	FileCategoryTest
	FileCategoryVendored
	FileCategoryGenerated
	FileCategoryConfig
	FileCategoryDotFile
	FileCategoryBinary
	FileCategoryDocumentation
)

type HashOptions ¶

type HashOptions struct {
	// contains filtered or unexported fields
}

HashOptions contains only the options in Options that upon modification leads to IndexState of IndexStateMismatch during the next index building.

type IndexFile ¶

type IndexFile interface {
	Read(off uint32, sz uint32) ([]byte, error)
	Size() (uint32, error)
	Close()
	Name() string
}

IndexFile is a file suitable for concurrent read access. For performance reasons, it allows a mmap'd implementation.

func NewIndexFile ¶

func NewIndexFile(f *os.File) (IndexFile, error)

NewIndexFile returns a new index file. The index file takes ownership of the passed in file, and may close it.

type IndexState ¶

type IndexState string

const (
	IndexStateMissing IndexState = "missing"
	IndexStateCorrupt IndexState = "corrupt"
	IndexStateVersion IndexState = "version-mismatch"
	IndexStateOption  IndexState = "option-mismatch"
	IndexStateMeta    IndexState = "meta-mismatch"
	IndexStateContent IndexState = "content-mismatch"
	IndexStateEqual   IndexState = "equal"
)

type Options ¶

type Options struct {
	// IndexDir is a directory that holds *.zoekt index files.
	IndexDir string

	// SizeMax is the maximum file size
	SizeMax int

	// Parallelism is the maximum number of shards to index in parallel
	Parallelism int

	// ShardMax sets the maximum corpus size for a single shard
	ShardMax int

	// TrigramMax sets the maximum number of distinct trigrams per document.
	TrigramMax int

	// RepositoryDescription holds names and URLs for the repository.
	RepositoryDescription zoekt.Repository

	// SubRepositories is a path => sub repository map.
	SubRepositories map[string]*zoekt.Repository

	// DisableCTags disables the generation of ctags metadata.
	DisableCTags bool

	// CtagsPath is the path to the ctags binary to run, or empty
	// if a valid binary couldn't be found.
	CTagsPath string

	// Same as CTagsPath but for scip-ctags
	ScipCTagsPath string

	// If set, ctags must succeed.
	CTagsMustSucceed bool

	// LargeFiles is a slice of glob patterns, including ** for any number
	// of directories, where matching file paths should be indexed
	// regardless of their size. The full pattern syntax is here:
	// https://github.com/bmatcuk/doublestar/tree/v1#patterns.
	LargeFiles []string

	// IsDelta is true if this run contains only the changed documents since the
	// last run.
	IsDelta bool

	LanguageMap ctags.LanguageMap

	// ShardMerging is true if builder should respect compound shards. This is a
	// Sourcegraph specific option.
	ShardMerging bool

	// HeapProfileTriggerBytes is the heap allocation in bytes that will trigger a memory profile. If 0, no memory profile
	// will be triggered. Note this trigger looks at total heap allocation (which includes both inuse and garbage objects).
	//
	// Profiles will be written to files named `index-memory.prof.n` in the index directory. No more than 10 files are written.
	//
	// Note: heap checking is "best effort", and it's possible for the process to OOM without triggering the heap profile.
	HeapProfileTriggerBytes uint64
	// contains filtered or unexported fields
}

Options sets options for the index building.

func (*Options) Args ¶

func (o *Options) Args() []string

Args generates command line arguments for o. It is the "inverse" of Flags.

func (*Options) FindAllShards ¶

func (o *Options) FindAllShards() []string

func (*Options) FindRepositoryMetadata ¶

func (o *Options) FindRepositoryMetadata() (repository *zoekt.Repository, metadata *zoekt.IndexMetadata, ok bool, err error)

FindRepositoryMetadata returns the index metadata for the repository specified in the options. 'ok' is false if the repository's metadata couldn't be found or if an error occurred.

func (*Options) Flags ¶

func (o *Options) Flags(fs *flag.FlagSet)

Flags adds flags for build options to fs. It is the "inverse" of Args.

func (*Options) GetHash ¶

func (o *Options) GetHash() string

func (*Options) HashOptions ¶

func (o *Options) HashOptions() HashOptions

func (*Options) IgnoreSizeMax ¶

func (o *Options) IgnoreSizeMax(name string) bool

IgnoreSizeMax determines whether the max size should be ignored.

func (*Options) IncrementalSkipIndexing ¶

func (o *Options) IncrementalSkipIndexing() bool

IncrementalSkipIndexing returns true if the index present on disk matches the build options.

func (*Options) IndexState ¶

func (o *Options) IndexState() (IndexState, string)

IndexState checks how the index present on disk compares to the build options and returns the IndexState and the name of the first shard.

func (*Options) SetDefaults ¶

func (o *Options) SetDefaults()

SetDefaults sets reasonable default options.

type ShardBuilder ¶

type ShardBuilder struct {

	// IndexTime will be used as the time if non-zero. Otherwise
	// time.Now(). This is useful for doing reproducible builds in tests.
	IndexTime time.Time

	// a sortable 20 chars long id.
	ID string
	// contains filtered or unexported fields
}

ShardBuilder builds a single index shard.

func NewShardBuilder ¶

func NewShardBuilder(r *zoekt.Repository) (*ShardBuilder, error)

NewShardBuilder creates a fresh ShardBuilder. The passed in Repository contains repo metadata, and may be set to nil.

func (*ShardBuilder) Add ¶

func (b *ShardBuilder) Add(doc Document) error

Add a file which only occurs in certain branches.

func (*ShardBuilder) AddFile ¶

func (b *ShardBuilder) AddFile(name string, content []byte) error

AddFile is a convenience wrapper for Add

func (*ShardBuilder) ContentSize ¶

func (b *ShardBuilder) ContentSize() uint32

ContentSize returns the number of content bytes so far ingested.

func (*ShardBuilder) NumFiles ¶

func (b *ShardBuilder) NumFiles() int

NumFiles returns the number of files added to this builder

func (*ShardBuilder) Write ¶

func (b *ShardBuilder) Write(out io.Writer) error

type SkipReason ¶

type SkipReason int

const (
	SkipReasonNone SkipReason = iota
	SkipReasonTooLarge
	SkipReasonTooSmall
	SkipReasonBinary
	SkipReasonTooManyTrigrams
)

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Documentation ¶

Overview ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func BranchNamesEqual ¶

func DetermineFileCategory ¶

func DetermineLanguageIfUnknown ¶

func Explode ¶

func HostnameBestEffort ¶

func IndexFilePaths ¶

func JsonMarshalRepoMetaTemp ¶

func Merge ¶

func NewSearcher ¶

func ParseTemplate ¶

func PrintNgramStats ¶

func ReadMetadata ¶

func ReadMetadataPath ¶

func ReadMetadataPathAlive ¶

func SetTombstone ¶

func SortAndTruncateFiles ¶

func SortFiles ¶

func UnsetTombstone ¶

Types ¶

type Branch ¶

type Builder ¶

func NewBuilder ¶

func (*Builder) Add ¶

func (*Builder) AddFile ¶

func (*Builder) CheckMemoryUsage ¶

func (*Builder) Finish ¶

func (*Builder) MarkFileAsChangedOrRemoved ¶

type DisplayTruncator ¶

func NewDisplayTruncator ¶

type DocChecker ¶

func (*DocChecker) Check ¶

type Document ¶

type DocumentSection ¶

type FileCategory ¶

type HashOptions ¶

type IndexFile ¶

func NewIndexFile ¶

type IndexState ¶

type Options ¶

func (*Options) Args ¶

func (*Options) FindAllShards ¶

func (*Options) FindRepositoryMetadata ¶

func (*Options) Flags ¶

func (*Options) GetHash ¶

func (*Options) HashOptions ¶

func (*Options) IgnoreSizeMax ¶

func (*Options) IncrementalSkipIndexing ¶

func (*Options) IndexState ¶

func (*Options) SetDefaults ¶

type ShardBuilder ¶

func NewShardBuilder ¶

func (*ShardBuilder) Add ¶

func (*ShardBuilder) AddFile ¶

func (*ShardBuilder) ContentSize ¶

func (*ShardBuilder) NumFiles ¶

func (*ShardBuilder) Write ¶

type SkipReason ¶

Source Files ¶