zoekt

package module

v0.0.0-...-4e4a529 Latest Latest Go to latest Published: Sep 2, 2025 License: Apache-2.0 Imports: 18 Imported by: 22

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/sourcegraph/zoekt

Links

Open Source Insights

README ¶

Zoekt: fast code search

"Zoekt, en gij zult spinazie eten" - Jan Eertink

("seek, and ye shall eat spinach" - My primary school teacher)

Zoekt is a text search engine intended for use with source code. (Pronunciation: roughly as you would pronounce "zooked" in English)

Note: This has been the maintained source for Zoekt since 2017, when it was forked from the original repository github.com/google/zoekt.

Background

Zoekt supports fast substring and regexp matching on source code, with a rich query language that includes boolean operators (and, or, not). It can search individual repositories, and search across many repositories in a large codebase. Zoekt ranks search results using a combination of code-related signals like whether the match is on a symbol. Because of its general design based on trigram indexing and syntactic parsing, it works well for a variety of programming languages.

The two main ways to use the project are

Through individual commands, to index repositories and perform searches through Zoekt's query language
Or, through the indexserver and webserver, which support syncing repositories from a code host and searching them through a web UI or API

For more details on Zoekt's design, see the docs directory.

Usage

Installation

go get github.com/sourcegraph/zoekt/

Note: It is also recommended to install Universal ctags, as symbol information is a key signal in ranking search results. See ctags.md for more information.

Command-based usage

Zoekt supports indexing and searching repositories on the command line. This is most helpful for simple local usage, or for testing and development.

Indexing a local git repo

go install github.com/sourcegraph/zoekt/cmd/zoekt-git-index
$GOPATH/bin/zoekt-git-index -index ~/.zoekt /path/to/repo

Indexing a local directory (not git-specific)

go install github.com/sourcegraph/zoekt/cmd/zoekt-index
$GOPATH/bin/zoekt-index -index ~/.zoekt /path/to/repo

Searching an index

go install github.com/sourcegraph/zoekt/cmd/zoekt
$GOPATH/bin/zoekt 'hello'
$GOPATH/bin/zoekt 'hello file:README'

Zoekt services

Zoekt also contains an index server and web server to support larger-scale indexing and searching of remote repositories. The index server can be configured to periodically fetch and reindex repositories from a code host. The webserver can be configured to serve search results through a web UI or API.

Indexing a GitHub organization

go install github.com/sourcegraph/zoekt/cmd/zoekt-indexserver

echo YOUR_GITHUB_TOKEN_HERE > token.txt
echo '[{"GitHubOrg": "apache", "CredentialPath": "token.txt"}]' > config.json

$GOPATH/bin/zoekt-indexserver -mirror_config config.json -data_dir ~/.zoekt/

This will fetch all repos under 'github.com/apache', then index the repositories. The indexserver takes care of periodically fetching and indexing new data, and cleaning up logfiles. See config.go for more details on this configuration.

Starting the web server

go install github.com/sourcegraph/zoekt/cmd/zoekt-webserver
$GOPATH/bin/zoekt-webserver -index ~/.zoekt/

This will start a web server with a simple search UI at http://localhost:6070. See the query syntax docs for more details on the query language.

If you start the web server with -rpc, it exposes a simple JSON search API at http://localhost:6070/api/search.

The JSON API supports advanced features including:

Streaming search results (using the FlushWallTime option)
Alternative BM25 scoring (using the UseBM25Scoring option)
Context lines around matches (using the NumContextLines option)

Finally, the web server exposes a gRPC API that supports structured query objects and advanced search options.

Acknowledgements

Thanks to Han-Wen Nienhuys for creating Zoekt. Thanks to Alexander Neubeck for coming up with this idea, and helping Han-Wen Nienhuys flesh it out.

Documentation ¶

Index ¶

Variables
type ChunkMatch
- func ChunkMatchFromProto(p *webserverv1.ChunkMatch) ChunkMatch
- func (cm *ChunkMatch) ToProto() *webserverv1.ChunkMatch
type FileMatch
- func FileMatchFromProto(p *webserverv1.FileMatch) FileMatch
- func (m *FileMatch) AddScore(what string, computed float64, raw float64, debugScore bool)
- func (m *FileMatch) ToProto() *webserverv1.FileMatch
type FlushReason
- func FlushReasonFromProto(p webserverv1.FlushReason) FlushReason
- func (fr FlushReason) Generate(rand *rand.Rand, size int) reflect.Value
- func (fr FlushReason) String() string
- func (fr FlushReason) ToProto() webserverv1.FlushReason
type IndexMetadata
- func IndexMetadataFromProto(p *webserverv1.IndexMetadata) IndexMetadata
- func (m *IndexMetadata) ToProto() *webserverv1.IndexMetadata
type LineFragmentMatch
- func LineFragmentMatchFromProto(p *webserverv1.LineFragmentMatch) LineFragmentMatch
- func (lfm *LineFragmentMatch) ToProto() *webserverv1.LineFragmentMatch
type LineMatch
- func LineMatchFromProto(p *webserverv1.LineMatch) LineMatch
- func (lm *LineMatch) ToProto() *webserverv1.LineMatch
type ListOptions
- func ListOptionsFromProto(p *webserverv1.ListOptions) *ListOptions
- func (o *ListOptions) GetField() (RepoListField, error)
- func (o *ListOptions) String() string
- func (l *ListOptions) ToProto() *webserverv1.ListOptions
type Location
- func LocationFromProto(p *webserverv1.Location) Location
- func (l *Location) ToProto() *webserverv1.Location
type MinimalRepoListEntry
- func MinimalRepoListEntryFromProto(p *webserverv1.MinimalRepoListEntry) MinimalRepoListEntry
- func (m *MinimalRepoListEntry) ToProto() *webserverv1.MinimalRepoListEntry
type Progress
- func ProgressFromProto(p *webserverv1.Progress) Progress
- func (p *Progress) ToProto() *webserverv1.Progress
type Range
- func RangeFromProto(p *webserverv1.Range) Range
- func (r *Range) ToProto() *webserverv1.Range
type RepoList
- func RepoListFromProto(p *webserverv1.ListResponse) *RepoList
- func (r *RepoList) ToProto() *webserverv1.ListResponse
type RepoListEntry
- func RepoListEntryFromProto(p *webserverv1.RepoListEntry) *RepoListEntry
- func (r *RepoListEntry) ToProto() *webserverv1.RepoListEntry
type RepoListField
type RepoStats
- func RepoStatsFromProto(p *webserverv1.RepoStats) RepoStats
- func (s *RepoStats) Add(o *RepoStats)
- func (s *RepoStats) ToProto() *webserverv1.RepoStats
type ReposMap
- func (q *ReposMap) MarshalBinary() ([]byte, error)
- func (q *ReposMap) UnmarshalBinary(b []byte) error
type Repository
- func RepositoryFromProto(p *webserverv1.Repository) Repository
- func (r *Repository) GetPriority() float64
- func (r *Repository) MergeMutable(x *Repository) (mutated bool, err error)
- func (r *Repository) ToProto() *webserverv1.Repository
- func (r *Repository) UnmarshalJSON(data []byte) error
type RepositoryBranch
- func RepositoryBranchFromProto(p *webserverv1.RepositoryBranch) RepositoryBranch
- func (r RepositoryBranch) String() string
- func (r *RepositoryBranch) ToProto() *webserverv1.RepositoryBranch
type SearchOptions
- func SearchOptionsFromProto(p *webserverv1.SearchOptions) *SearchOptions
- func (o *SearchOptions) SetDefaults()
- func (s *SearchOptions) String() string
- func (s *SearchOptions) ToProto() *webserverv1.SearchOptions
type SearchResult
- func SearchResultFromProto(p *webserverv1.SearchResponse, repoURLs, lineFragments map[string]string) *SearchResult
- func SearchResultFromStreamProto(p *webserverv1.StreamSearchResponse, repoURLs, lineFragments map[string]string) *SearchResult
- func (sr *SearchResult) SizeBytes() (sz uint64)
- func (sr *SearchResult) ToProto() *webserverv1.SearchResponse
- func (sr *SearchResult) ToStreamProto() *webserverv1.StreamSearchResponse
type Searcher
type Sender
type SenderFunc
- func (f SenderFunc) Send(result *SearchResult)
type Stats
- func StatsFromProto(p *webserverv1.Stats) Stats
- func (s *Stats) Add(o Stats)
- func (s *Stats) ToProto() *webserverv1.Stats
- func (s *Stats) Zero() bool
type Streamer
type Symbol
- func SymbolFromProto(p *webserverv1.SymbolInfo) *Symbol
- func (s *Symbol) ToProto() *webserverv1.SymbolInfo

Constants ¶

This section is empty.

Variables ¶

View Source

var FlushReasonStrings = map[FlushReason]string{
	FlushReasonTimerExpired: "timer_expired",
	FlushReasonFinalFlush:   "final_flush",
	FlushReasonMaxSize:      "max_size_reached",
}

Functions ¶

This section is empty.

Types ¶

type ChunkMatch ¶

type ChunkMatch struct {
	DebugScore string

	// Content is a contiguous range of complete lines that fully contains Ranges.
	// Lines will always include their terminating newline (if it exists).
	Content []byte

	// Ranges is a set of matching ranges within this chunk. Each range is relative
	// to the beginning of the file (not the beginning of Content).
	Ranges []Range

	// SymbolInfo is the symbol information associated with Ranges. If it is non-nil,
	// its length will equal that of Ranges. Any of its elements may be nil.
	SymbolInfo []*Symbol

	// FileName indicates whether this match is a match on the file name, in
	// which case Content will contain the file name.
	FileName bool

	// ContentStart is the location (inclusive) of the beginning of content
	// relative to the beginning of the file. It will always be at the
	// beginning of a line (Column will always be 1).
	ContentStart Location

	// Score is the overall relevance score of this chunk.
	Score float64

	// BestLineMatch is the line number of the highest-scoring line match in this chunk.
	// The line number represents the index in the full file, and is 1-based. If FileName: true,
	// this number will be 0.
	BestLineMatch uint32
}

ChunkMatch is a set of non-overlapping matches within a contiguous range of lines in the file.

func ChunkMatchFromProto ¶

func ChunkMatchFromProto(p *webserverv1.ChunkMatch) ChunkMatch

func (*ChunkMatch) ToProto ¶

func (cm *ChunkMatch) ToProto() *webserverv1.ChunkMatch

type FileMatch ¶

type FileMatch struct {
	FileName string

	// Repository is the globally unique name of the repo of the
	// match
	Repository string

	// SubRepositoryName is the globally unique name of the repo,
	// if it came from a subrepository
	SubRepositoryName string `json:",omitempty"`

	// SubRepositoryPath holds the prefix where the subrepository
	// was mounted.
	SubRepositoryPath string `json:",omitempty"`

	// Commit SHA1 (hex) of the (sub)repo holding the file.
	Version string `json:",omitempty"`

	// Detected language of the result.
	Language string

	// For debugging. Needs DebugScore set, but public so tests in
	// other packages can print some diagnostics.
	Debug string `json:",omitempty"`

	Branches []string `json:",omitempty"`

	// One of LineMatches or ChunkMatches will be returned depending on whether
	// the SearchOptions.ChunkMatches is set.
	LineMatches  []LineMatch  `json:",omitempty"`
	ChunkMatches []ChunkMatch `json:",omitempty"`

	// Only set if requested
	Content []byte `json:",omitempty"`

	// Checksum of the content.
	Checksum []byte

	// Ranking; the higher, the better.
	Score float64 `json:",omitempty"`

	// RepositoryPriority is a Sourcegraph extension. It is used by Sourcegraph to
	// order results from different repositories relative to each other.
	RepositoryPriority float64 `json:",omitempty"`

	// RepositoryID is a Sourcegraph extension. This is the ID of Repository in
	// Sourcegraph.
	RepositoryID uint32 `json:",omitempty"`
}

FileMatch contains all the matches within a file.

func FileMatchFromProto ¶

func FileMatchFromProto(p *webserverv1.FileMatch) FileMatch

func (*FileMatch) AddScore ¶

func (m *FileMatch) AddScore(what string, computed float64, raw float64, debugScore bool)

AddScore increments the score of the FileMatch by the computed score. If debugScore is true, it also adds a debug string to the FileMatch. If raw is -1, it is ignored. Otherwise, it is added to the debug string.

func (*FileMatch) ToProto ¶

func (m *FileMatch) ToProto() *webserverv1.FileMatch

type FlushReason ¶

type FlushReason uint8

const (
	FlushReasonTimerExpired FlushReason = 1 << iota
	FlushReasonFinalFlush
	FlushReasonMaxSize
)

func FlushReasonFromProto ¶

func FlushReasonFromProto(p webserverv1.FlushReason) FlushReason

func (FlushReason) Generate ¶

func (fr FlushReason) Generate(rand *rand.Rand, size int) reflect.Value

Generate valid reasons for quickchecks

func (FlushReason) String ¶

func (fr FlushReason) String() string

func (FlushReason) ToProto ¶

func (fr FlushReason) ToProto() webserverv1.FlushReason

type IndexMetadata ¶

type IndexMetadata struct {
	IndexFormatVersion    int
	IndexFeatureVersion   int
	IndexMinReaderVersion int
	IndexTime             time.Time
	PlainASCII            bool
	LanguageMap           map[string]uint16
	ZoektVersion          string
	ID                    string
}

IndexMetadata holds metadata stored in the index file. It contains data generated by the core indexing library.

func IndexMetadataFromProto ¶

func IndexMetadataFromProto(p *webserverv1.IndexMetadata) IndexMetadata

func (*IndexMetadata) ToProto ¶

func (m *IndexMetadata) ToProto() *webserverv1.IndexMetadata

type LineFragmentMatch ¶

type LineFragmentMatch struct {
	// Offset within the line, in bytes.
	LineOffset int

	// Offset from file start, in bytes.
	Offset uint32

	// Number bytes that match.
	MatchLength int

	SymbolInfo *Symbol
}

LineFragmentMatch a segment of matching text within a line.

func LineFragmentMatchFromProto ¶

func LineFragmentMatchFromProto(p *webserverv1.LineFragmentMatch) LineFragmentMatch

func (*LineFragmentMatch) ToProto ¶

func (lfm *LineFragmentMatch) ToProto() *webserverv1.LineFragmentMatch

type LineMatch ¶

type LineMatch struct {
	// The line in which a match was found.
	Line []byte
	// The byte offset of the first byte of the line.
	LineStart int
	// The byte offset of the first byte past the end of the line.
	// This is usually the byte after the terminating newline, but can also be
	// the end of the file if there is no terminating newline
	LineEnd    int
	LineNumber int

	// Before and After are only set when SearchOptions.NumContextLines is > 0
	Before []byte
	After  []byte

	// If set, this was a match on the filename.
	FileName bool

	// The higher the better. Only ranks the quality of the match
	// within the file, does not take rank of file into account
	Score      float64
	DebugScore string

	LineFragments []LineFragmentMatch
}

LineMatch holds the matches within a single line in a file.

func LineMatchFromProto ¶

func LineMatchFromProto(p *webserverv1.LineMatch) LineMatch

func (*LineMatch) ToProto ¶

func (lm *LineMatch) ToProto() *webserverv1.LineMatch

type ListOptions ¶

type ListOptions struct {
	// Field decides which field to populate in RepoList response.
	Field RepoListField
}

func ListOptionsFromProto ¶

func ListOptionsFromProto(p *webserverv1.ListOptions) *ListOptions

func (*ListOptions) GetField ¶

func (o *ListOptions) GetField() (RepoListField, error)

func (*ListOptions) String ¶

func (o *ListOptions) String() string

func (*ListOptions) ToProto ¶

func (l *ListOptions) ToProto() *webserverv1.ListOptions

type Location ¶

type Location struct {
	// 0-based byte offset from the beginning of the file
	ByteOffset uint32
	// 1-based line number from the beginning of the file
	LineNumber uint32
	// 1-based column number (in runes) from the beginning of line
	Column uint32
}

func LocationFromProto ¶

func LocationFromProto(p *webserverv1.Location) Location

func (*Location) ToProto ¶

func (l *Location) ToProto() *webserverv1.Location

type MinimalRepoListEntry ¶

type MinimalRepoListEntry struct {
	// HasSymbols is exported since Sourcegraph uses this information at search
	// planning time to decide between Zoekt and an unindexed symbol search.
	//
	// Note: it pretty much is always true in practice.
	HasSymbols bool

	// Branches is used by Sourcegraphs query planner to decided if it can use
	// zoekt or go via an unindexed code path.
	Branches []RepositoryBranch

	// IndexTimeUnix is the IndexTime converted to unix time (number of seconds
	// since the epoch). This is to make it clear we are not transporting the
	// full fidelty timestamp (ie with milliseconds and location). Additionally
	// it saves 16 bytes in this struct.
	//
	// IndexTime is used as a heuristic in Sourcegraph to decide in aggregate
	// how many repositories need updating after a ranking change/etc.
	//
	// TODO(keegancsmith) audit updates to IndexTime and document how and when
	// it changes. Concerned about things like metadata updates or compound
	// shards leading to untrustworthy data here.
	IndexTimeUnix int64
}

MinimalRepoListEntry is a subset of RepoListEntry. It was added after performance profiling of sourcegraph.com revealed that querying this information from Zoekt was causing lots of CPU and memory usage. Note: we can revisit this, how we store and query this information has changed a lot since this was introduced.

func MinimalRepoListEntryFromProto ¶

func MinimalRepoListEntryFromProto(p *webserverv1.MinimalRepoListEntry) MinimalRepoListEntry

func (*MinimalRepoListEntry) ToProto ¶

func (m *MinimalRepoListEntry) ToProto() *webserverv1.MinimalRepoListEntry

type Progress ¶

type Progress struct {
	// Priority of the shard that was searched.
	Priority float64

	// MaxPendingPriority is the maximum priority of pending result that is being searched in parallel.
	// This is used to reorder results when the result set is known to be stable-- that is, when a result's
	// Priority is greater than the max(MaxPendingPriority) from the latest results of each backend, it can be returned to the user.
	//
	// MaxPendingPriority decreases monotonically in each SearchResult.
	MaxPendingPriority float64
}

Progress contains information about the global progress of the running search query. This is used by the frontend to reorder results and emit them when stable. Sourcegraph specific: this is used when querying multiple zoekt-webserver instances.

func ProgressFromProto ¶

func ProgressFromProto(p *webserverv1.Progress) Progress

func (*Progress) ToProto ¶

func (p *Progress) ToProto() *webserverv1.Progress

type Range ¶

type Range struct {
	// The inclusive beginning of the range.
	Start Location
	// The exclusive end of the range.
	End Location
}

func RangeFromProto ¶

func RangeFromProto(p *webserverv1.Range) Range

func (*Range) ToProto ¶

func (r *Range) ToProto() *webserverv1.Range

type RepoList ¶

type RepoList struct {
	// Returned when ListOptions.Field is RepoListFieldRepos.
	Repos []*RepoListEntry

	// ReposMap is set when ListOptions.Field is RepoListFieldReposMap.
	ReposMap ReposMap

	Crashes int

	// Stats response to a List request.
	// This is the aggregate RepoStats of all repos matching the input query.
	Stats RepoStats
}

RepoList holds a set of Repository metadata.

func RepoListFromProto ¶

func RepoListFromProto(p *webserverv1.ListResponse) *RepoList

func (*RepoList) ToProto ¶

func (r *RepoList) ToProto() *webserverv1.ListResponse

type RepoListEntry ¶

type RepoListEntry struct {
	Repository    Repository
	IndexMetadata IndexMetadata
	Stats         RepoStats
}

func RepoListEntryFromProto ¶

func RepoListEntryFromProto(p *webserverv1.RepoListEntry) *RepoListEntry

func (*RepoListEntry) ToProto ¶

func (r *RepoListEntry) ToProto() *webserverv1.RepoListEntry

type RepoListField ¶

type RepoListField int

const (
	RepoListFieldRepos    RepoListField = 0
	RepoListFieldReposMap               = 2
)

type RepoStats ¶

type RepoStats struct {
	// Repos is used for aggregrating the number of repositories.
	//
	// Note: This field is not populated on RepoListEntry.Stats (individual) but
	// only for RepoList.Stats (aggregate).
	Repos int

	// Shards is the total number of search shards.
	Shards int

	// Documents holds the number of documents or files.
	Documents int

	// IndexBytes is the amount of RAM used for index overhead.
	IndexBytes int64

	// ContentBytes is the amount of RAM used for raw content.
	ContentBytes int64

	// NewLinesCount is the number of newlines "\n" that appear in the zoekt
	// indexed documents. This is not exactly the same as line count, since it
	// will not include lines not terminated by "\n" (eg a file with no "\n", or
	// a final line without "\n"). Note: Zoekt deduplicates documents across
	// branches, so if a path has the same contents on multiple branches, there
	// is only one document for it. As such that document's newlines is only
	// counted once. See DefaultBranchNewLinesCount and AllBranchesNewLinesCount
	// for counts which do not deduplicate.
	NewLinesCount uint64

	// DefaultBranchNewLinesCount is the number of newlines "\n" in the default
	// branch.
	DefaultBranchNewLinesCount uint64

	// OtherBranchesNewLinesCount is the number of newlines "\n" in all branches
	// except the default branch.
	OtherBranchesNewLinesCount uint64
}

Statistics of a (collection of) repositories.

func RepoStatsFromProto ¶

func RepoStatsFromProto(p *webserverv1.RepoStats) RepoStats

func (*RepoStats) Add ¶

func (s *RepoStats) Add(o *RepoStats)

func (*RepoStats) ToProto ¶

func (s *RepoStats) ToProto() *webserverv1.RepoStats

type ReposMap ¶

type ReposMap map[uint32]MinimalRepoListEntry

func (*ReposMap) MarshalBinary ¶

func (q *ReposMap) MarshalBinary() ([]byte, error)

MarshalBinary implements a specialized encoder for ReposMap.

func (*ReposMap) UnmarshalBinary ¶

func (q *ReposMap) UnmarshalBinary(b []byte) error

UnmarshalBinary implements a specialized decoder for ReposMap.

type Repository ¶

type Repository struct {
	// Sourcegraph's tenant ID
	TenantID int

	// Sourcegraph's repository ID
	ID uint32

	// The repository name
	Name string

	// The repository URL.
	URL string

	// Additional metadata about the repository.
	Metadata map[string]string

	// The physical source where this repo came from, eg. full
	// path to the zip filename or git repository directory. This
	// will not be exposed in the UI, but can be used to detect
	// orphaned index shards.
	Source string

	// The branches indexed in this repo.
	Branches []RepositoryBranch

	// Nil if this is not the super project.
	SubRepoMap map[string]*Repository

	// URL template to link to the commit of a branch
	CommitURLTemplate string

	// The repository URL for getting to a file.  Has access to
	// {{.Version}}, {{.Path}}
	FileURLTemplate string

	// The URL fragment to add to a file URL for line numbers. has
	// access to {{.LineNumber}}. The fragment should include the
	// separator, generally '#' or ';'.
	LineFragmentTemplate string

	// All zoekt.* configuration settings.
	RawConfig map[string]string

	// Importance of the repository, bigger is more important
	Rank uint16

	// IndexOptions is a hash of the options used to create the index for the
	// repo.
	IndexOptions string

	// HasSymbols is true if this repository has indexed ctags
	// output. Sourcegraph specific: This field is more appropriate for
	// IndexMetadata. However, we store it here since the Sourcegraph frontend
	// can read this structure but not IndexMetadata.
	HasSymbols bool

	// Tombstone is true if we are not allowed to search this repo.
	Tombstone bool

	// LatestCommitDate is the date of the latest commit among all indexed Branches.
	// The date might be time.Time's 0-value if the repository was last indexed
	// before this field was added.
	LatestCommitDate time.Time

	// FileTombstones is a set of file paths that should be ignored across all branches
	// in this shard.
	FileTombstones map[string]struct{} `json:",omitempty"`
	// contains filtered or unexported fields
}

Repository holds repository metadata.

func RepositoryFromProto ¶

func RepositoryFromProto(p *webserverv1.Repository) Repository

func (*Repository) GetPriority ¶

func (r *Repository) GetPriority() float64

func (*Repository) MergeMutable ¶

func (r *Repository) MergeMutable(x *Repository) (mutated bool, err error)

MergeMutable will merge x into r. mutated will be true if it made any changes. err is non-nil if we needed to mutate an immutable field.

Note: SubRepoMap, IndexOptions and HasSymbol fields are ignored. They are computed while indexing so can't be synthesized from x.

Note: We ignore RawConfig fields which are duplicated into Repository: name and id.

func (*Repository) ToProto ¶

func (r *Repository) ToProto() *webserverv1.Repository

func (*Repository) UnmarshalJSON ¶

func (r *Repository) UnmarshalJSON(data []byte) error

type RepositoryBranch ¶

type RepositoryBranch struct {
	Name    string
	Version string
}

RepositoryBranch describes an indexed branch, which is a name combined with a version.

func RepositoryBranchFromProto ¶

func RepositoryBranchFromProto(p *webserverv1.RepositoryBranch) RepositoryBranch

func (RepositoryBranch) String ¶

func (r RepositoryBranch) String() string

func (*RepositoryBranch) ToProto ¶

func (r *RepositoryBranch) ToProto() *webserverv1.RepositoryBranch

type SearchOptions ¶

type SearchOptions struct {
	// Return an upper-bound estimate of eligible documents in
	// stats.ShardFilesConsidered.
	EstimateDocCount bool

	// Return the whole file.
	Whole bool

	// Maximum number of matches: skip all processing an index
	// shard after we found this many non-overlapping matches.
	ShardMaxMatchCount int

	// Maximum number of matches: stop looking for more matches
	// once we have this many matches across shards.
	TotalMaxMatchCount int

	// Maximum number of matches: skip processing documents for a repository in
	// a shard once we have found ShardRepoMaxMatchCount.
	//
	// A compound shard may contain multiple repositories. This will most often
	// be set to 1 to find all repositories containing a result.
	ShardRepoMaxMatchCount int

	// Abort the search after this much time has passed.
	MaxWallTime time.Duration

	// FlushWallTime if non-zero will stop streaming behaviour at first and
	// instead will collate and sort results. At FlushWallTime the results will
	// be sent and then the behaviour will revert to the normal streaming.
	FlushWallTime time.Duration

	// Truncates the number of documents (i.e. files) after collating and
	// sorting the results.
	MaxDocDisplayCount int

	// Truncates the number of matchs after collating and sorting the results.
	MaxMatchDisplayCount int

	// If set to a number greater than zero then up to this many number
	// of context lines will be added before and after each matched line.
	// Note that the included context lines might contain matches and
	// it's up to the consumer of the result to remove those lines.
	NumContextLines int

	// If true, ChunkMatches will be returned in each FileMatch rather than LineMatches
	// EXPERIMENTAL: the behavior of this flag may be changed in future versions.
	ChunkMatches bool

	// EXPERIMENTAL. If true, use text-search style scoring instead of the default
	// scoring formula. The scoring algorithm treats each match in a file as a term
	// and computes an approximation to BM25. When enabled, BM25 scoring is used for
	// the overall FileMatch score, as well as individual LineMatch and ChunkMatch scores.
	//
	// The calculation of IDF assumes that Zoekt visits all documents containing any
	// of the query terms during evaluation. This is true, for example, if all query
	// terms are ORed together.
	//
	// When enabled, all other scoring signals are ignored, including document ranks.
	UseBM25Scoring bool

	// Trace turns on opentracing for this request if true and if the Jaeger address was provided as
	// a command-line flag
	Trace bool

	// If set, the search results will contain debug information for scoring.
	DebugScore bool

	// SpanContext is the opentracing span context, if it exists, from the zoekt client
	SpanContext map[string]string
}

func SearchOptionsFromProto ¶

func SearchOptionsFromProto(p *webserverv1.SearchOptions) *SearchOptions

func (*SearchOptions) SetDefaults ¶

func (o *SearchOptions) SetDefaults()

func (*SearchOptions) String ¶

func (s *SearchOptions) String() string

String returns a succinct representation of the options. This is meant for human consumption in logs and traces.

Note: some tracing systems have limits on length of values, so we take care to try and make this small, and include the important information near the front incase of truncation.

func (*SearchOptions) ToProto ¶

func (s *SearchOptions) ToProto() *webserverv1.SearchOptions

type SearchResult ¶

type SearchResult struct {
	Stats

	// Do not encode this as we cannot encode -Inf in JSON
	Progress `json:"-"`

	Files []FileMatch

	// RepoURLs holds a repo => template string map.
	RepoURLs map[string]string

	// FragmentNames holds a repo => template string map, for
	// the line number fragment.
	LineFragments map[string]string
}

SearchResult contains search matches and extra data

func SearchResultFromProto ¶

func SearchResultFromProto(p *webserverv1.SearchResponse, repoURLs, lineFragments map[string]string) *SearchResult

func SearchResultFromStreamProto ¶

func SearchResultFromStreamProto(p *webserverv1.StreamSearchResponse, repoURLs, lineFragments map[string]string) *SearchResult

func (*SearchResult) SizeBytes ¶

func (sr *SearchResult) SizeBytes() (sz uint64)

SizeBytes is a best-effort estimate of the size of SearchResult in memory. The estimate does not take alignment into account. The result is a lower bound on the actual size in memory.

func (*SearchResult) ToProto ¶

func (sr *SearchResult) ToProto() *webserverv1.SearchResponse

func (*SearchResult) ToStreamProto ¶

func (sr *SearchResult) ToStreamProto() *webserverv1.StreamSearchResponse

type Searcher ¶

type Searcher interface {
	Search(ctx context.Context, q query.Q, opts *SearchOptions) (*SearchResult, error)

	// List lists repositories. The query `q` can only contain
	// query.Repo atoms.
	List(ctx context.Context, q query.Q, opts *ListOptions) (*RepoList, error)
	Close()

	// Describe the searcher for debug messages.
	String() string
}

type Sender ¶

type Sender interface {
	Send(*SearchResult)
}

Sender is the interface that wraps the basic Send method.

type SenderFunc ¶

type SenderFunc func(result *SearchResult)

SenderFunc is an adapter to allow the use of ordinary functions as Sender. If f is a function with the appropriate signature, SenderFunc(f) is a Sender that calls f.

func (SenderFunc) Send ¶

func (f SenderFunc) Send(result *SearchResult)

type Stats ¶

type Stats struct {
	// Amount of I/O for reading contents.
	ContentBytesLoaded int64

	// Amount of I/O for reading from index.
	IndexBytesLoaded int64

	// Number of search shards that had a crash.
	Crashes int

	// Wall clock time for this search
	Duration time.Duration

	// Number of files containing a match.
	FileCount int

	// Number of files in shards that we considered.
	ShardFilesConsidered int

	// Files that we evaluated. Equivalent to files for which all
	// atom matches (including negations) evaluated to true.
	FilesConsidered int

	// Files for which we loaded file content to verify substring matches
	FilesLoaded int

	// Candidate files whose contents weren't examined because we
	// gathered enough matches.
	FilesSkipped int

	// Shards that we scanned to find matches.
	ShardsScanned int

	// Shards that we did not process because a query was canceled.
	ShardsSkipped int

	// Shards that we did not process because the query was rejected by the
	// ngram filter indicating it had no matches.
	ShardsSkippedFilter int

	// Number of non-overlapping matches
	MatchCount int

	// Number of candidate matches as a result of searching ngrams.
	NgramMatches int

	// NgramLookups is the number of times we accessed an ngram in the index.
	NgramLookups int

	// Wall clock time for queued search.
	Wait time.Duration

	// Aggregate wall clock time spent constructing and pruning the match tree.
	// This accounts for time such as lookups in the trigram index.
	MatchTreeConstruction time.Duration

	// Aggregate wall clock time spent searching the match tree. This accounts
	// for the bulk of search work done looking for matches.
	MatchTreeSearch time.Duration

	// Number of times regexp was called on files that we evaluated.
	RegexpsConsidered int

	// FlushReason explains why results were flushed.
	FlushReason FlushReason
}

Stats contains interesting numbers on the search

func StatsFromProto ¶

func StatsFromProto(p *webserverv1.Stats) Stats

func (*Stats) Add ¶

func (s *Stats) Add(o Stats)

func (*Stats) ToProto ¶

func (s *Stats) ToProto() *webserverv1.Stats

func (*Stats) Zero ¶

func (s *Stats) Zero() bool

Zero returns true if stats is empty.

type Streamer ¶

type Streamer interface {
	Searcher
	StreamSearch(ctx context.Context, q query.Q, opts *SearchOptions, sender Sender) (err error)
}

Streamer adds the method StreamSearch to the Searcher interface.

type Symbol ¶

type Symbol struct {
	Sym        string
	Kind       string
	Parent     string
	ParentKind string
}

func SymbolFromProto ¶

func SymbolFromProto(p *webserverv1.SymbolInfo) *Symbol

func (*Symbol) ToProto ¶

func (s *Symbol) ToProto() *webserverv1.SymbolInfo

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
zoekt command The 'zoekt' command supports searching over an index directory or shard.	The 'zoekt' command supports searching over an index directory or shard.
zoekt-archive-index command Command zoekt-archive-index indexes a git archive.	Command zoekt-archive-index indexes a git archive.
zoekt-dynamic-indexserver command Command zoekt-dynamic-indexserver starts a server to manage dynamic indexing.	Command zoekt-dynamic-indexserver starts a server to manage dynamic indexing.
zoekt-git-clone command Command zoekt-git-clone fetches all repos of a user or organization and clones them.	Command zoekt-git-clone fetches all repos of a user or organization and clones them.
zoekt-git-index command Command zoekt-git-index indexes a single git repository.	Command zoekt-git-index indexes a single git repository.
zoekt-index command Command zoekt-index indexes a directory of files.	Command zoekt-index indexes a directory of files.
zoekt-indexserver command Command zoekt-indexserver starts a service that periodically reindexes repositories.	Command zoekt-indexserver starts a service that periodically reindexes repositories.
zoekt-merge-index command Command zoekt-merge-index merges a set of index shards into a compound shard.	Command zoekt-merge-index merges a set of index shards into a compound shard.
zoekt-mirror-bitbucket-server command Command zoekt-mirror-bitbucket-server fetches all repos of a bitbucket project, optionally of a specific type, and clones them.	Command zoekt-mirror-bitbucket-server fetches all repos of a bitbucket project, optionally of a specific type, and clones them.
zoekt-mirror-gerrit command Command zoekt-mirror-gerrit fetches all repos of a Gerrit host.	Command zoekt-mirror-gerrit fetches all repos of a Gerrit host.
zoekt-mirror-gitea command Command zoekt-mirror-gerrit fetches all repos of a gitea user or organization and clones them.	Command zoekt-mirror-gerrit fetches all repos of a gitea user or organization and clones them.
zoekt-mirror-github command Command zoekt-mirror-github fetches all repos of a github user or organization and clones them.	Command zoekt-mirror-github fetches all repos of a github user or organization and clones them.
zoekt-mirror-gitiles command Command zoekt-mirror-gitiles fetches all repos of a Gitiles host.	Command zoekt-mirror-gitiles fetches all repos of a Gitiles host.
zoekt-mirror-gitlab command Command zoekt-mirror-gitlab fetches all repos for a user from gitlab.	Command zoekt-mirror-gitlab fetches all repos for a user from gitlab.
zoekt-repo-index command Command zoekt-repo-index indexes repository that uses the Android 'repo' tool (https://android.googlesource.com/tools/repo).	Command zoekt-repo-index indexes repository that uses the Android 'repo' tool (https://android.googlesource.com/tools/repo).
zoekt-sourcegraph-indexserver command Command zoekt-sourcegraph-indexserver periodically reindexes repositories from a Sourcegraph instance.	Command zoekt-sourcegraph-indexserver periodically reindexes repositories from a Sourcegraph instance.
zoekt-sourcegraph-indexserver/grpc/protos/sourcegraph/zoekt/configuration/v1
zoekt-sourcegraph-indexserver/grpc/protos/zoekt/indexserver/v1
zoekt-test command Command zoekt-test compares the zoekt results with raw substring search.	Command zoekt-test compares the zoekt results with raw substring search.
zoekt-webserver command Command zoekt-webserver starts a server that responds to search queries, using an index generated by another program such as zoekt-indexserver.	Command zoekt-webserver starts a server that responds to search queries, using an index generated by another program such as zoekt-indexserver.
zoekt-webserver/grpc/server
gitindex Package gitindex provides functions for indexing Git repositories.	Package gitindex provides functions for indexing Git repositories.
grpc
chunk Package chunk provides a utility for sending sets of protobuf messages in groups of smaller chunks.	Package chunk provides a utility for sending sets of protobuf messages in groups of smaller chunks.
defaults
grpcutil
internalerrs
messagesize
propagator
protos/zoekt/webserver/v1
testprotos/news/v1
ignore Package ignore provides helpers to support ignore-files similar to .gitignore	Package ignore provides helpers to support ignore-files similar to .gitignore
index Package index contains logic for building Zoekt indexes.	Package index contains logic for building Zoekt indexes.
internal
archive package archive provides indexing of archives from remote URLs.	package archive provides indexing of archives from remote URLs.
ctags
debugserver
e2e package e2e contains end to end tests	package e2e contains end to end tests
json
mockSearcher
otlpenv Package otlpenv exports getters to read OpenTelemetry protocol configuration options based on the official spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md#configuration-options	Package otlpenv exports getters to read OpenTelemetry protocol configuration options based on the official spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md#configuration-options
profiler
syntaxutil
tenant
tenant/internal/enforcement
tenant/internal/tenanttype
tenant/systemtenant Package systemtenant exports UnsafeCtx which allows to access shards across all tenants.	Package systemtenant exports UnsafeCtx which allows to access shards across all tenants.
tenant/tenanttest
trace Package trace provides a tracing API that in turn invokes both the `golang.org/x/net/trace` API and creates an opentracing span if appropriate.	Package trace provides a tracing API that in turn invokes both the `golang.org/x/net/trace` API and creates an opentracing span if appropriate.
tracer
languages Package languages provides enhanced language detection capabilities on top of go-enry, with additional heuristics and mappings for better accuracy.	Package languages provides enhanced language detection capabilities on top of go-enry, with additional heuristics and mappings for better accuracy.
query Package query contains the API for creating Zoekt queries.	Package query contains the API for creating Zoekt queries.
search
web Package web contains the logic for spinning up a zoekt webserver.	Package web contains the logic for spinning up a zoekt webserver.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL