zoekt

package
v0.0.0-...-0464db2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 19, 2018 License: Apache-2.0 Imports: 21 Imported by: 0

Documentation

Index

Constants

View Source
const FeatureVersion = 7

FeatureVersion is increased if a feature is added that requires reindexing data without changing the format version 2: Rank field for shards. 3: Rank documents within shards 4: Dedup file bugfix 5: Remove max line size limit 6: Include '#' into the LineFragment template 7: Record skip reasons in the index.

View Source
const IndexFormatVersion = 15

FormatVersion is a version number. It is increased every time the on-disk index format is changed. 5: subrepositories. 6: remove size prefix for posting varint list. 7: move subrepos into Repository struct. 8: move repoMetaData out of indexMetadata 9: use bigendian uint64 for trigrams. 10: sections for rune offsets. 11: file ends in rune offsets. 12: 64-bit branchmasks. 13: content checksums 14: languages 15: rune based symbol sections

Variables

View Source
var DebugScore = false

DebugScore controls whether we collect data on match scores are constructed. Intended for use in tests.

View Source
var Version string

Filled by the linker (see ./shared/scripts/build-deploy.sh)

Functions

func CheckText

func CheckText(content []byte) error

CheckText returns a reason why the given contents are probably not source texts.

func ReadMetadata

func ReadMetadata(inf IndexFile) (*Repository, *IndexMetadata, error)

ReadMetadata returns the metadata of index shard without reading the index data. The IndexFile is not closed.

func SortFilesByScore

func SortFilesByScore(ms []FileMatch)

Sort a slice of results.

Types

type Document

type Document struct {
	Name              string
	Content           []byte
	Branches          []string
	SubRepositoryPath string
	Language          string

	// If set, something is wrong with the file contents, and this
	// is the reason it wasn't indexed.
	SkipReason string

	// Document sections for symbols. Offsets should use bytes.
	Symbols []DocumentSection
}

Document holds a document (file) to index.

type DocumentSection

type DocumentSection struct {
	Start, End uint32
}

type FileMatch

type FileMatch struct {
	// Ranking; the higher, the better.
	Score float64 // TODO - hide this field?

	// For debugging. Needs DebugScore set, but public so tests in
	// other packages can print some diagnostics.
	Debug string

	FileName string

	// Repository is the globally unique name of the repo of the
	// match
	Repository  string
	Branches    []string
	LineMatches []LineMatch

	// Only set if requested
	Content []byte

	// Checksum of the content.
	Checksum []byte

	// Detected language of the result.
	Language string

	// SubRepositoryName is the globally unique name of the repo,
	// if it came from a subrepository
	SubRepositoryName string

	// SubRepositoryPath holds the prefix where the subrepository
	// was mounted.
	SubRepositoryPath string

	// Commit SHA1 (hex) of the (sub)repo holding the file.
	Version string
}

FileMatch contains all the matches within a file.

type IndexBuilder

type IndexBuilder struct {
	// contains filtered or unexported fields
}

IndexBuilder builds a single index shard.

func NewIndexBuilder

func NewIndexBuilder(r *Repository) (*IndexBuilder, error)

NewIndexBuilder creates a fresh IndexBuilder. The passed in Repository contains repo metadata, and may be set to nil.

func (*IndexBuilder) Add

func (b *IndexBuilder) Add(doc Document) error

Add a file which only occurs in certain branches.

func (*IndexBuilder) AddFile

func (b *IndexBuilder) AddFile(name string, content []byte) error

AddFile is a convenience wrapper for Add

func (*IndexBuilder) ContentSize

func (b *IndexBuilder) ContentSize() uint32

ContentSize returns the number of content bytes so far ingested.

func (*IndexBuilder) Write

func (b *IndexBuilder) Write(out io.Writer) error

type IndexFile

type IndexFile interface {
	Read(off uint32, sz uint32) ([]byte, error)
	Size() (uint32, error)
	Close()
	Name() string
}

IndexFile is a file suitable for concurrent read access. For performance reasons, it allows a mmap'd implementation.

func NewIndexFile

func NewIndexFile(f *os.File) (IndexFile, error)

NewIndexFile returns a new index file. The index file takes ownership of the passed in file, and may close it.

type IndexMetadata

type IndexMetadata struct {
	IndexFormatVersion  int
	IndexFeatureVersion int
	IndexTime           time.Time
	PlainASCII          bool
	LanguageMap         map[string]byte
	ZoektVersion        string
}

IndexMetadata holds metadata stored in the index file.

type LineFragmentMatch

type LineFragmentMatch struct {
	// Offset within the line, in bytes.
	LineOffset int

	// Offset from file start, in bytes.
	Offset uint32

	// Number bytes that match.
	MatchLength int
}

LineFragmentMatch a segment of matching text within a line.

type LineMatch

type LineMatch struct {
	// The line in which a match was found.
	Line       []byte
	LineStart  int
	LineEnd    int
	LineNumber int

	// If set, this was a match on the filename.
	FileName bool

	// The higher the better. Only ranks the quality of the match
	// within the file, does not take rank of file into account
	Score         float64
	LineFragments []LineFragmentMatch
}

LineMatch holds the matches within a single line in a file.

type RepoList

type RepoList struct {
	Repos   []*RepoListEntry
	Crashes int
}

RepoList holds a set of Repository metadata.

type RepoListEntry

type RepoListEntry struct {
	Repository    Repository
	IndexMetadata IndexMetadata
	Stats         RepoStats
}

type RepoStats

type RepoStats struct {
	// Repos is used for aggregrating the number of repositories.
	Repos int

	// Shards is the total number of search shards.
	Shards int

	// Documents holds the number of documents or files.
	Documents int

	// IndexBytes is the amount of RAM used for index overhead.
	IndexBytes int64

	// ContentBytes is the amount of RAM used for raw content.
	ContentBytes int64
}

Statistics of a (collection of) repositories.

func (*RepoStats) Add

func (s *RepoStats) Add(o *RepoStats)

type Repository

type Repository struct {
	// The repository name
	Name string
	// The repository URL.
	URL string

	// The branches indexed in this repo.
	Branches []RepositoryBranch

	// Nil if this is not the super project.
	SubRepoMap map[string]*Repository

	// URL template to link to the commit of a branch
	CommitURLTemplate string

	// The repository URL for getting to a file.  Has access to
	// {{Branch}}, {{Path}}
	FileURLTemplate string

	// The URL fragment to add to a file URL for line numbers. has
	// access to {{LineNumber}}. The fragment should include the
	// separator, generally '#' or ';'.
	LineFragmentTemplate string

	// All zoekt.* configuration settings.
	RawConfig map[string]string

	// Importance of the repository, bigger is more important
	Rank uint16
}

Repository holds repository metadata.

type RepositoryBranch

type RepositoryBranch struct {
	Name    string
	Version string
}

RepositoryBranch describes an indexed branch, which is a name combined with a version.

type SearchOptions

type SearchOptions struct {
	// Return an upper-bound estimate of eligible documents in
	// stats.ShardFilesConsidered.
	EstimateDocCount bool

	// Return the whole file.
	Whole bool

	// Maximum number of matches: skip all processing an index
	// shard after we found this many non-overlapping matches.
	ShardMaxMatchCount int

	// Maximum number of matches: stop looking for more matches
	// once we have this many matches across shards.
	TotalMaxMatchCount int

	// Maximum number of important matches: skip processing
	// shard after we found this many important matches.
	ShardMaxImportantMatch int

	// Maximum number of important matches across shards.
	TotalMaxImportantMatch int

	// Abort the search after this much time has passed.
	MaxWallTime time.Duration

	// Trim the number of results after collating and sorting the
	// results
	MaxDocDisplayCount int
}

func (*SearchOptions) SetDefaults

func (o *SearchOptions) SetDefaults()

func (*SearchOptions) String

func (s *SearchOptions) String() string

type SearchResult

type SearchResult struct {
	Stats
	Files []FileMatch

	// RepoURLs holds a repo => template string map.
	RepoURLs map[string]string

	// FragmentNames holds a repo => template string map, for
	// the line number fragment.
	LineFragments map[string]string
}

SearchResult contains search matches and extra data

type Searcher

type Searcher interface {
	Search(ctx context.Context, q query.Q, opts *SearchOptions) (*SearchResult, error)

	// List lists repositories. The query `q` can only contain
	// query.Repo atoms.
	List(ctx context.Context, q query.Q) (*RepoList, error)
	Close()

	// Describe the searcher for debug messages.
	String() string
}

func NewSearcher

func NewSearcher(r IndexFile) (Searcher, error)

NewSearcher creates a Searcher for a single index file. Search results coming from this searcher are valid only for the lifetime of the Searcher itself, ie. []byte members should be copied into fresh buffers if the result is to survive closing the shard.

type Stats

type Stats struct {
	// Amount of I/O for reading contents.
	ContentBytesLoaded int64

	// Amount of I/O for reading from index.
	IndexBytesLoaded int64

	// Number of search shards that had a crash.
	Crashes int

	// Wall clock time for this search
	Duration time.Duration

	// Number of files containing a match.
	FileCount int

	// Number of files in shards that we considered.
	ShardFilesConsidered int

	// Files that we evaluated. Equivalent to files for which all
	// atom matches (including negations) evaluated to true.
	FilesConsidered int

	// Files for which we loaded file content to verify substring matches
	FilesLoaded int

	// Candidate files whose contents weren't examined because we
	// gathered enough matches.
	FilesSkipped int

	// Shards that we did not process because a query was canceled.
	ShardsSkipped int

	// Number of non-overlapping matches
	MatchCount int

	// Number of candidate matches as a result of searching ngrams.
	NgramMatches int

	// Wall clock time for queued search.
	Wait time.Duration
}

Stats contains interesting numbers on the search

func (*Stats) Add

func (s *Stats) Add(o Stats)

Directories

Path Synopsis
package build implements a more convenient interface for building zoekt indices.
package build implements a more convenient interface for building zoekt indices.
Package gitindex provides functions for indexing Git repositories.
Package gitindex provides functions for indexing Git repositories.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL