zoekt

package module
v0.0.0-...-6df0554 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 23, 2024 License: Apache-2.0 Imports: 43 Imported by: 20

README

"Zoekt, en gij zult spinazie eten" - Jan Eertink

("seek, and ye shall eat spinach" - My primary school teacher)

This is a fast text search engine, intended for use with source code. (Pronunciation: roughly as you would pronounce "zooked" in English)

Note: This is a Sourcegraph fork of github.com/google/zoekt. It is now the main maintained source of Zoekt.

INSTRUCTIONS

Downloading

go get github.com/sourcegraph/zoekt/

Indexing

Directory
go install github.com/sourcegraph/zoekt/cmd/zoekt-index
$GOPATH/bin/zoekt-index .
Git repository
go install github.com/sourcegraph/zoekt/cmd/zoekt-git-index
$GOPATH/bin/zoekt-git-index -branches master,stable-1.4 -prefix origin/ .
Repo repositories
go install github.com/sourcegraph/zoekt/cmd/zoekt-{repo-index,mirror-gitiles}
zoekt-mirror-gitiles -dest ~/repos/ https://gfiber.googlesource.com
zoekt-repo-index \
    -name gfiber \
    -base_url https://gfiber.googlesource.com/ \
    -manifest_repo ~/repos/gfiber.googlesource.com/manifests.git \
    -repo_cache ~/repos \
    -manifest_rev_prefix=refs/heads/ --rev_prefix= \
    master:default_unrestricted.xml

Searching

Web interface
go install github.com/sourcegraph/zoekt/cmd/zoekt-webserver
$GOPATH/bin/zoekt-webserver -listen :6070
JSON API

You can retrieve search results as JSON by sending a GET request to zoekt-webserver.

curl --get \
    --url "http://localhost:6070/search" \
    --data-urlencode "q=ngram f:READ" \
    --data-urlencode "num=50" \
    --data-urlencode "format=json"

The response data is a JSON object. You can refer to web.ApiSearchResult to learn about the structure of the object.

CLI
go install github.com/sourcegraph/zoekt/cmd/zoekt
$GOPATH/bin/zoekt 'ngram f:READ'

Installation

A more organized installation on a Linux server should use a systemd unit file, eg.

[Unit]
Description=zoekt webserver

[Service]
ExecStart=/zoekt/bin/zoekt-webserver -index /zoekt/index -listen :443  --ssl_cert /zoekt/etc/cert.pem   --ssl_key /zoekt/etc/key.pem
Restart=always

[Install]
WantedBy=default.target

SEARCH SERVICE

Zoekt comes with a small service management program:

go install github.com/sourcegraph/zoekt/cmd/zoekt-indexserver

cat << EOF > config.json
[{"GithubUser": "username"},
 {"GithubOrg": "org"},
 {"GitilesURL": "https://gerrit.googlesource.com", "Name": "zoekt" }
]
EOF

$GOPATH/bin/zoekt-indexserver -mirror_config config.json

This will mirror all repos under 'github.com/username', 'github.com/org', as well as the 'zoekt' repository. It will index the repositories.

It takes care of fetching and indexing new data and cleaning up logfiles.

The webserver can be started from a standard service management framework, such as systemd.

It is recommended to install Universal ctags to improve ranking. See here for more information.

ACKNOWLEDGEMENTS

Thanks to Han-Wen Nienhuys for creating Zoekt. Thanks to Alexander Neubeck for coming up with this idea, and helping Han-Wen Nienhuys flesh it out.

FORK DETAILS

Originally this fork contained some changes that do not make sense to upstream and or have not yet been upstreamed. However, this is now the defacto source for Zoekt. This section will remain for historical reasons and contains outdated information. It can be removed once the dust settles on moving from google/zoekt to sourcegraph/zoekt. Differences:

  • zoekt-sourcegraph-indexserver is a Sourcegraph specific command which indexes all enabled repositories on Sourcegraph, as well as keeping the indexes up to date.
  • We have exposed the API via keegancsmith/rpc (a fork of net/rpc which supports cancellation).
  • Query primitive BranchesRepos to efficiently specify a set of repositories to search.
  • Allow empty shard directories on startup. Needed when starting a fresh instance which hasn't indexed anything yet.
  • We can return symbol/ctag data in results. Additionally we can run symbol regex queries.
  • We search shards in order of repo name and ignore shard ranking.
  • Other minor changes.

Assuming you have the gerrit upstream configured, a useful way to see what we changed is:

$ git diff gerrit/master -- ':(exclude)vendor/' ':(exclude)Gopkg*'

DISCLAIMER

This is not an official Google product

Documentation

Index

Constants

View Source
const FeatureVersion = 12

FeatureVersion is increased if a feature is added that requires reindexing data without changing the format version 2: Rank field for shards. 3: Rank documents within shards 4: Dedup file bugfix 5: Remove max line size limit 6: Include '#' into the LineFragment template 7: Record skip reasons in the index. 8: Record source path in the index. 9: Store ctags metadata & bump default max file size 10: Compound shards; more flexible TOC format. 11: Bloom filters for file names & contents 12: go-enry for identifying file languages

View Source
const IndexFormatVersion = 16

IndexFormatVersion is a version number. It is increased every time the on-disk index format is changed. 5: subrepositories. 6: remove size prefix for posting varint list. 7: move subrepos into Repository struct. 8: move repoMetaData out of indexMetadata 9: use bigendian uint64 for trigrams. 10: sections for rune offsets. 11: file ends in rune offsets. 12: 64-bit branchmasks. 13: content checksums 14: languages 15: rune based symbol sections 16: ctags metadata

View Source
const NextIndexFormatVersion = 17

17: compound shard (multi repo)

View Source
const ReadMinFeatureVersion = 8

ReadMinFeatureVersion constrains backwards compatibility by refusing to load a file with a FeatureVersion below it.

View Source
const WriteMinFeatureVersion = 10

WriteMinFeatureVersion constrains forwards compatibility by emitting files that won't load in zoekt with a FeatureVersion below it.

Variables

View Source
var FlushReasonStrings = map[FlushReason]string{
	FlushReasonTimerExpired: "timer_expired",
	FlushReasonFinalFlush:   "final_flush",
	FlushReasonMaxSize:      "max_size_reached",
}
View Source
var Version string

Filled by the linker

Functions

func DetermineLanguageIfUnknown

func DetermineLanguageIfUnknown(doc *Document)

func Explode

func Explode(dstDir string, f IndexFile) (map[string]string, error)

Explode takes an IndexFile f and creates 1 simple shard per repository contained in f. Explode returns a map of tmpName -> dstName. It is the responsibility of the caller to rename the temporary shard(s) and delete the input shard.

func HostnameBestEffort

func HostnameBestEffort() string

func IndexFilePaths

func IndexFilePaths(p string) ([]string, error)

IndexFilePaths returns all paths for the IndexFile at filepath p that exist. Note: if no files exist this will return an empty slice and nil error.

This is p and the ".meta" file for p.

func JsonMarshalRepoMetaTemp

func JsonMarshalRepoMetaTemp(shardPath string, repositoryMetadata interface{}) (tempPath, finalPath string, err error)

JsonMarshalRepoMetaTemp writes the json encoding of the given repository metadata to a temporary file in the same directory as the given shard path. It returns both the path of the temporary file and the path of the final file that the caller should use.

The caller is responsible for renaming the temporary file to the final file path, or removing the temporary file if it is no longer needed. TODO: Should we stick this in a util package?

func Merge

func Merge(dstDir string, files ...IndexFile) (tmpName, dstName string, _ error)

Merge files into a compound shard in dstDir. Merge returns tmpName and a dstName. It is the responsibility of the caller to delete the input shards and rename the temporary compound shard from tmpName to dstName.

func PrintNgramStats

func PrintNgramStats(r IndexFile) error

PrintNgramStats outputs a list of the form

n_1 trigram_1
n_2 trigram_2
...

where n_i is the length of the postings list of trigram_i stored in r.

func ReadMetadata

func ReadMetadata(inf IndexFile) ([]*Repository, *IndexMetadata, error)

ReadMetadata returns the metadata of index shard without reading the index data. The IndexFile is not closed.

func ReadMetadataPath

func ReadMetadataPath(p string) ([]*Repository, *IndexMetadata, error)

ReadMetadataPath returns the metadata of index shard at p without reading the index data. ReadMetadataPath is a helper for ReadMetadata which opens the IndexFile at p.

func ReadMetadataPathAlive

func ReadMetadataPathAlive(p string) ([]*Repository, *IndexMetadata, error)

ReadMetadataPathAlive is like ReadMetadataPath except that it only returns alive repositories.

func SetTombstone

func SetTombstone(shardPath string, repoID uint32) error

SetTombstone idempotently sets a tombstone for repoName in .meta.

func ShardMergingEnabled

func ShardMergingEnabled() bool

ShardMergingEnabled returns true if SRC_ENABLE_SHARD_MERGING is set to true.

func SortFiles

func SortFiles(ms []FileMatch)

SortFiles sorts files matches in the order we want to present results to users. The order depends on the match score, which includes both query-dependent signals like word overlap, and file-only signals like the file ranks (if file ranks are enabled).

We don't only use the scores, we will also boost some results to present files with novel extensions.

func UnsetTombstone

func UnsetTombstone(shardPath string, repoID uint32) error

UnsetTombstone idempotently removes a tombstones for reopName in .meta.

Types

type ChunkMatch

type ChunkMatch struct {
	DebugScore string

	// Content is a contiguous range of complete lines that fully contains Ranges.
	// Lines will always include their terminating newline (if it exists).
	Content []byte

	// Ranges is a set of matching ranges within this chunk. Each range is relative
	// to the beginning of the file (not the beginning of Content).
	Ranges []Range

	// SymbolInfo is the symbol information associated with Ranges. If it is non-nil,
	// its length will equal that of Ranges. Any of its elements may be nil.
	SymbolInfo []*Symbol

	// FileName indicates whether this match is a match on the file name, in
	// which case Content will contain the file name.
	FileName bool

	// ContentStart is the location (inclusive) of the beginning of content
	// relative to the beginning of the file. It will always be at the
	// beginning of a line (Column will always be 1).
	ContentStart Location

	Score float64
}

ChunkMatch is a set of non-overlapping matches within a contiguous range of lines in the file.

func ChunkMatchFromProto

func ChunkMatchFromProto(p *proto.ChunkMatch) ChunkMatch

func (*ChunkMatch) ToProto

func (cm *ChunkMatch) ToProto() *proto.ChunkMatch

type DisplayTruncator

type DisplayTruncator func(before []FileMatch) (after []FileMatch, hasMore bool)

DisplayTruncator is a stateful function which enforces Document and Match display limits by truncating and mutating before. hasMore is true until the limits are exhausted. Once hasMore is false each subsequent call will return an empty after and hasMore false.

func NewDisplayTruncator

func NewDisplayTruncator(opts *SearchOptions) (_ DisplayTruncator, hasLimits bool)

NewDisplayTruncator will return a DisplayTruncator which enforces the limits in opts. If there are no limits to enforce, hasLimits is false and there is no need to call DisplayTruncator.

type DocChecker

type DocChecker struct {
	// contains filtered or unexported fields
}

func (*DocChecker) Check

func (t *DocChecker) Check(content []byte, maxTrigramCount int, allowLargeFile bool) error

Check returns a reason why the given contents are probably not source texts.

type Document

type Document struct {
	Name              string
	Content           []byte
	Branches          []string
	SubRepositoryPath string
	Language          string

	// If set, something is wrong with the file contents, and this
	// is the reason it wasn't indexed.
	SkipReason string

	// Document sections for symbols. Offsets should use bytes.
	Symbols         []DocumentSection
	SymbolsMetaData []*Symbol

	// Ranks is a vector of ranks for a document as provided by a DocumentRanksFile
	// file in the git repo.
	//
	// Two documents can be ordered by comparing the components of their rank
	// vectors. Bigger entries are better, as are longer vectors.
	//
	// This field is experimental and may change at any time without warning.
	Ranks []float64
}

Document holds a document (file) to index.

type DocumentSection

type DocumentSection struct {
	Start, End uint32
}

type FileMatch

type FileMatch struct {
	FileName string

	// Repository is the globally unique name of the repo of the
	// match
	Repository string

	// SubRepositoryName is the globally unique name of the repo,
	// if it came from a subrepository
	SubRepositoryName string `json:",omitempty"`

	// SubRepositoryPath holds the prefix where the subrepository
	// was mounted.
	SubRepositoryPath string `json:",omitempty"`

	// Commit SHA1 (hex) of the (sub)repo holding the file.
	Version string `json:",omitempty"`

	// Detected language of the result.
	Language string

	// For debugging. Needs DebugScore set, but public so tests in
	// other packages can print some diagnostics.
	Debug string `json:",omitempty"`

	Branches []string `json:",omitempty"`

	// One of LineMatches or ChunkMatches will be returned depending on whether
	// the SearchOptions.ChunkMatches is set.
	LineMatches  []LineMatch  `json:",omitempty"`
	ChunkMatches []ChunkMatch `json:",omitempty"`

	// Only set if requested
	Content []byte `json:",omitempty"`

	// Checksum of the content.
	Checksum []byte

	// Ranking; the higher, the better.
	Score float64 `json:",omitempty"`

	// RepositoryPriority is a Sourcegraph extension. It is used by Sourcegraph to
	// order results from different repositories relative to each other.
	RepositoryPriority float64 `json:",omitempty"`

	// RepositoryID is a Sourcegraph extension. This is the ID of Repository in
	// Sourcegraph.
	RepositoryID uint32 `json:",omitempty"`
}

FileMatch contains all the matches within a file.

func FileMatchFromProto

func FileMatchFromProto(p *proto.FileMatch) FileMatch

func SortAndTruncateFiles

func SortAndTruncateFiles(files []FileMatch, opts *SearchOptions) []FileMatch

SortAndTruncateFiles is a convenience around SortFiles and DisplayTruncator. Given an aggregated files it will sort and then truncate based on the search options.

func (*FileMatch) ToProto

func (m *FileMatch) ToProto() *proto.FileMatch

type FlushReason

type FlushReason uint8
const (
	FlushReasonTimerExpired FlushReason = 1 << iota
	FlushReasonFinalFlush
	FlushReasonMaxSize
)

func FlushReasonFromProto

func FlushReasonFromProto(p proto.FlushReason) FlushReason

func (FlushReason) Generate

func (fr FlushReason) Generate(rand *rand.Rand, size int) reflect.Value

Generate valid reasons for quickchecks

func (FlushReason) String

func (fr FlushReason) String() string

func (FlushReason) ToProto

func (fr FlushReason) ToProto() proto.FlushReason

type IndexBuilder

type IndexBuilder struct {

	// IndexTime will be used as the time if non-zero. Otherwise
	// time.Now(). This is useful for doing reproducible builds in tests.
	IndexTime time.Time

	// a sortable 20 chars long id.
	ID string
	// contains filtered or unexported fields
}

IndexBuilder builds a single index shard.

func NewIndexBuilder

func NewIndexBuilder(r *Repository) (*IndexBuilder, error)

NewIndexBuilder creates a fresh IndexBuilder. The passed in Repository contains repo metadata, and may be set to nil.

func (*IndexBuilder) Add

func (b *IndexBuilder) Add(doc Document) error

Add a file which only occurs in certain branches.

func (*IndexBuilder) AddFile

func (b *IndexBuilder) AddFile(name string, content []byte) error

AddFile is a convenience wrapper for Add

func (*IndexBuilder) ContentSize

func (b *IndexBuilder) ContentSize() uint32

ContentSize returns the number of content bytes so far ingested.

func (*IndexBuilder) NumFiles

func (b *IndexBuilder) NumFiles() int

NumFiles returns the number of files added to this builder

func (*IndexBuilder) Write

func (b *IndexBuilder) Write(out io.Writer) error

type IndexFile

type IndexFile interface {
	Read(off uint32, sz uint32) ([]byte, error)
	Size() (uint32, error)
	Close()
	Name() string
}

IndexFile is a file suitable for concurrent read access. For performance reasons, it allows a mmap'd implementation.

func NewIndexFile

func NewIndexFile(f *os.File) (IndexFile, error)

NewIndexFile returns a new index file. The index file takes ownership of the passed in file, and may close it.

type IndexMetadata

type IndexMetadata struct {
	IndexFormatVersion    int
	IndexFeatureVersion   int
	IndexMinReaderVersion int
	IndexTime             time.Time
	PlainASCII            bool
	LanguageMap           map[string]uint16
	ZoektVersion          string
	ID                    string
}

IndexMetadata holds metadata stored in the index file. It contains data generated by the core indexing library.

func IndexMetadataFromProto

func IndexMetadataFromProto(p *proto.IndexMetadata) IndexMetadata

func (*IndexMetadata) ToProto

func (m *IndexMetadata) ToProto() *proto.IndexMetadata

type LineFragmentMatch

type LineFragmentMatch struct {
	// Offset within the line, in bytes.
	LineOffset int

	// Offset from file start, in bytes.
	Offset uint32

	// Number bytes that match.
	MatchLength int

	SymbolInfo *Symbol
}

LineFragmentMatch a segment of matching text within a line.

func LineFragmentMatchFromProto

func LineFragmentMatchFromProto(p *proto.LineFragmentMatch) LineFragmentMatch

func (*LineFragmentMatch) ToProto

func (lfm *LineFragmentMatch) ToProto() *proto.LineFragmentMatch

type LineMatch

type LineMatch struct {
	// The line in which a match was found.
	Line []byte
	// The byte offset of the first byte of the line.
	LineStart int
	// The byte offset of the first byte past the end of the line.
	// This is usually the byte after the terminating newline, but can also be
	// the end of the file if there is no terminating newline
	LineEnd    int
	LineNumber int

	// Before and After are only set when SearchOptions.NumContextLines is > 0
	Before []byte
	After  []byte

	// If set, this was a match on the filename.
	FileName bool

	// The higher the better. Only ranks the quality of the match
	// within the file, does not take rank of file into account
	Score      float64
	DebugScore string

	LineFragments []LineFragmentMatch
}

LineMatch holds the matches within a single line in a file.

func LineMatchFromProto

func LineMatchFromProto(p *proto.LineMatch) LineMatch

func (*LineMatch) ToProto

func (lm *LineMatch) ToProto() *proto.LineMatch

type ListOptions

type ListOptions struct {
	// Field decides which field to populate in RepoList response.
	Field RepoListField
}

func ListOptionsFromProto

func ListOptionsFromProto(p *proto.ListOptions) *ListOptions

func (*ListOptions) GetField

func (o *ListOptions) GetField() (RepoListField, error)

func (*ListOptions) String

func (o *ListOptions) String() string

func (*ListOptions) ToProto

func (l *ListOptions) ToProto() *proto.ListOptions

type Location

type Location struct {
	// 0-based byte offset from the beginning of the file
	ByteOffset uint32
	// 1-based line number from the beginning of the file
	LineNumber uint32
	// 1-based column number (in runes) from the beginning of line
	Column uint32
}

func LocationFromProto

func LocationFromProto(p *proto.Location) Location

func (*Location) ToProto

func (l *Location) ToProto() *proto.Location

type MinimalRepoListEntry

type MinimalRepoListEntry struct {
	// HasSymbols is exported since Sourcegraph uses this information at search
	// planning time to decide between Zoekt and an unindexed symbol search.
	//
	// Note: it pretty much is always true in practice.
	HasSymbols bool

	// Branches is used by Sourcegraphs query planner to decided if it can use
	// zoekt or go via an unindexed code path.
	Branches []RepositoryBranch

	// IndexTimeUnix is the IndexTime converted to unix time (number of seconds
	// since the epoch). This is to make it clear we are not transporting the
	// full fidelty timestamp (ie with milliseconds and location). Additionally
	// it saves 16 bytes in this struct.
	//
	// IndexTime is used as a heuristic in Sourcegraph to decide in aggregate
	// how many repositories need updating after a ranking change/etc.
	//
	// TODO(keegancsmith) audit updates to IndexTime and document how and when
	// it changes. Concerned about things like metadata updates or compound
	// shards leading to untrustworthy data here.
	IndexTimeUnix int64
}

MinimalRepoListEntry is a subset of RepoListEntry. It was added after performance profiling of sourcegraph.com revealed that querying this information from Zoekt was causing lots of CPU and memory usage. Note: we can revisit this, how we store and query this information has changed a lot since this was introduced.

func (*MinimalRepoListEntry) ToProto

type Progress

type Progress struct {
	// Priority of the shard that was searched.
	Priority float64

	// MaxPendingPriority is the maximum priority of pending result that is being searched in parallel.
	// This is used to reorder results when the result set is known to be stable-- that is, when a result's
	// Priority is greater than the max(MaxPendingPriority) from the latest results of each backend, it can be returned to the user.
	//
	// MaxPendingPriority decreases monotonically in each SearchResult.
	MaxPendingPriority float64
}

Progress contains information about the global progress of the running search query. This is used by the frontend to reorder results and emit them when stable. Sourcegraph specific: this is used when querying multiple zoekt-webserver instances.

func ProgressFromProto

func ProgressFromProto(p *proto.Progress) Progress

func (*Progress) ToProto

func (p *Progress) ToProto() *proto.Progress

type Range

type Range struct {
	// The inclusive beginning of the range.
	Start Location
	// The exclusive end of the range.
	End Location
}

func RangeFromProto

func RangeFromProto(p *proto.Range) Range

func (*Range) ToProto

func (r *Range) ToProto() *proto.Range

type RepoList

type RepoList struct {
	// Returned when ListOptions.Field is RepoListFieldRepos.
	Repos []*RepoListEntry

	// ReposMap is set when ListOptions.Field is RepoListFieldReposMap.
	ReposMap ReposMap

	Crashes int

	// Stats response to a List request.
	// This is the aggregate RepoStats of all repos matching the input query.
	Stats RepoStats
}

RepoList holds a set of Repository metadata.

func RepoListFromProto

func RepoListFromProto(p *proto.ListResponse) *RepoList

func (*RepoList) ToProto

func (r *RepoList) ToProto() *proto.ListResponse

type RepoListEntry

type RepoListEntry struct {
	Repository    Repository
	IndexMetadata IndexMetadata
	Stats         RepoStats
}

func RepoListEntryFromProto

func RepoListEntryFromProto(p *proto.RepoListEntry) *RepoListEntry

func (*RepoListEntry) ToProto

func (r *RepoListEntry) ToProto() *proto.RepoListEntry

type RepoListField

type RepoListField int
const (
	RepoListFieldRepos    RepoListField = 0
	RepoListFieldReposMap               = 2
)

type RepoStats

type RepoStats struct {
	// Repos is used for aggregrating the number of repositories.
	//
	// Note: This field is not populated on RepoListEntry.Stats (individual) but
	// only for RepoList.Stats (aggregate).
	Repos int

	// Shards is the total number of search shards.
	Shards int

	// Documents holds the number of documents or files.
	Documents int

	// IndexBytes is the amount of RAM used for index overhead.
	IndexBytes int64

	// ContentBytes is the amount of RAM used for raw content.
	ContentBytes int64

	// NewLinesCount is the number of newlines "\n" that appear in the zoekt
	// indexed documents. This is not exactly the same as line count, since it
	// will not include lines not terminated by "\n" (eg a file with no "\n", or
	// a final line without "\n"). Note: Zoekt deduplicates documents across
	// branches, so if a path has the same contents on multiple branches, there
	// is only one document for it. As such that document's newlines is only
	// counted once. See DefaultBranchNewLinesCount and AllBranchesNewLinesCount
	// for counts which do not deduplicate.
	NewLinesCount uint64

	// DefaultBranchNewLinesCount is the number of newlines "\n" in the default
	// branch.
	DefaultBranchNewLinesCount uint64

	// OtherBranchesNewLinesCount is the number of newlines "\n" in all branches
	// except the default branch.
	OtherBranchesNewLinesCount uint64
}

Statistics of a (collection of) repositories.

func RepoStatsFromProto

func RepoStatsFromProto(p *proto.RepoStats) RepoStats

func (*RepoStats) Add

func (s *RepoStats) Add(o *RepoStats)

func (*RepoStats) ToProto

func (s *RepoStats) ToProto() *proto.RepoStats

type ReposMap

type ReposMap map[uint32]MinimalRepoListEntry

func (*ReposMap) MarshalBinary

func (q *ReposMap) MarshalBinary() ([]byte, error)

MarshalBinary implements a specialized encoder for ReposMap.

func (*ReposMap) UnmarshalBinary

func (q *ReposMap) UnmarshalBinary(b []byte) error

UnmarshalBinary implements a specialized decoder for ReposMap.

type Repository

type Repository struct {
	// Sourcegraph's repository ID
	ID uint32

	// The repository name
	Name string

	// The repository URL.
	URL string

	// The physical source where this repo came from, eg. full
	// path to the zip filename or git repository directory. This
	// will not be exposed in the UI, but can be used to detect
	// orphaned index shards.
	Source string

	// The branches indexed in this repo.
	Branches []RepositoryBranch

	// Nil if this is not the super project.
	SubRepoMap map[string]*Repository

	// URL template to link to the commit of a branch
	CommitURLTemplate string

	// The repository URL for getting to a file.  Has access to
	// {{.Version}}, {{.Path}}
	FileURLTemplate string

	// The URL fragment to add to a file URL for line numbers. has
	// access to {{.LineNumber}}. The fragment should include the
	// separator, generally '#' or ';'.
	LineFragmentTemplate string

	// All zoekt.* configuration settings.
	RawConfig map[string]string

	// Importance of the repository, bigger is more important
	Rank uint16

	// IndexOptions is a hash of the options used to create the index for the
	// repo.
	IndexOptions string

	// HasSymbols is true if this repository has indexed ctags
	// output. Sourcegraph specific: This field is more appropriate for
	// IndexMetadata. However, we store it here since the Sourcegraph frontend
	// can read this structure but not IndexMetadata.
	HasSymbols bool

	// Tombstone is true if we are not allowed to search this repo.
	Tombstone bool

	// LatestCommitDate is the date of the latest commit among all indexed Branches.
	// The date might be time.Time's 0-value if the repository was last indexed
	// before this field was added.
	LatestCommitDate time.Time

	// FileTombstones is a set of file paths that should be ignored across all branches
	// in this shard.
	FileTombstones map[string]struct{} `json:",omitempty"`
	// contains filtered or unexported fields
}

Repository holds repository metadata.

func RepositoryFromProto

func RepositoryFromProto(p *proto.Repository) Repository

func (*Repository) MergeMutable

func (r *Repository) MergeMutable(x *Repository) (mutated bool, err error)

MergeMutable will merge x into r. mutated will be true if it made any changes. err is non-nil if we needed to mutate an immutable field.

Note: SubRepoMap, IndexOptions and HasSymbol fields are ignored. They are computed while indexing so can't be synthesized from x.

Note: We ignore RawConfig fields which are duplicated into Repository: name and id.

func (*Repository) ToProto

func (r *Repository) ToProto() *proto.Repository

func (*Repository) UnmarshalJSON

func (r *Repository) UnmarshalJSON(data []byte) error

type RepositoryBranch

type RepositoryBranch struct {
	Name    string
	Version string
}

RepositoryBranch describes an indexed branch, which is a name combined with a version.

func RepositoryBranchFromProto

func RepositoryBranchFromProto(p *proto.RepositoryBranch) RepositoryBranch

func (RepositoryBranch) String

func (r RepositoryBranch) String() string

func (*RepositoryBranch) ToProto

func (r *RepositoryBranch) ToProto() *proto.RepositoryBranch

type SearchOptions

type SearchOptions struct {
	// Return an upper-bound estimate of eligible documents in
	// stats.ShardFilesConsidered.
	EstimateDocCount bool

	// Return the whole file.
	Whole bool

	// Maximum number of matches: skip all processing an index
	// shard after we found this many non-overlapping matches.
	ShardMaxMatchCount int

	// Maximum number of matches: stop looking for more matches
	// once we have this many matches across shards.
	TotalMaxMatchCount int

	// Maximum number of matches: skip processing documents for a repository in
	// a shard once we have found ShardRepoMaxMatchCount.
	//
	// A compound shard may contain multiple repositories. This will most often
	// be set to 1 to find all repositories containing a result.
	ShardRepoMaxMatchCount int

	// Abort the search after this much time has passed.
	MaxWallTime time.Duration

	// FlushWallTime if non-zero will stop streaming behaviour at first and
	// instead will collate and sort results. At FlushWallTime the results will
	// be sent and then the behaviour will revert to the normal streaming.
	FlushWallTime time.Duration

	// Truncates the number of documents (i.e. files) after collating and
	// sorting the results.
	MaxDocDisplayCount int

	// Truncates the number of matchs after collating and sorting the results.
	MaxMatchDisplayCount int

	// If set to a number greater than zero then up to this many number
	// of context lines will be added before and after each matched line.
	// Note that the included context lines might contain matches and
	// it's up to the consumer of the result to remove those lines.
	NumContextLines int

	// If true, ChunkMatches will be returned in each FileMatch rather than LineMatches
	// EXPERIMENTAL: the behavior of this flag may be changed in future versions.
	ChunkMatches bool

	// EXPERIMENTAL. If true, document ranks are used as additional input for
	// sorting matches.
	UseDocumentRanks bool

	// EXPERIMENTAL. When UseDocumentRanks is enabled, this can be optionally set to adjust
	// their weight in the file match score. If the value is <= 0.0, the default weight value
	// will be used. This option is temporary and is only exposed for testing/ tuning purposes.
	DocumentRanksWeight float64

	// EXPERIMENTAL. If true, use keyword-style scoring instead of the default scoring formula.
	// Currently, this treats each match in a file as a term and computes an approximation to BM25.
	// When enabled, all other scoring signals are ignored, including document ranks.
	UseKeywordScoring bool

	// Trace turns on opentracing for this request if true and if the Jaeger address was provided as
	// a command-line flag
	Trace bool

	// If set, the search results will contain debug information for scoring.
	DebugScore bool

	// SpanContext is the opentracing span context, if it exists, from the zoekt client
	SpanContext map[string]string
}

func SearchOptionsFromProto

func SearchOptionsFromProto(p *proto.SearchOptions) *SearchOptions

func (*SearchOptions) SetDefaults

func (o *SearchOptions) SetDefaults()

func (*SearchOptions) String

func (s *SearchOptions) String() string

String returns a succinct representation of the options. This is meant for human consumption in logs and traces.

Note: some tracing systems have limits on length of values, so we take care to try and make this small, and include the important information near the front incase of truncation.

func (*SearchOptions) ToProto

func (s *SearchOptions) ToProto() *proto.SearchOptions

type SearchResult

type SearchResult struct {
	Stats

	// Do not encode this as we cannot encode -Inf in JSON
	Progress `json:"-"`

	Files []FileMatch

	// RepoURLs holds a repo => template string map.
	RepoURLs map[string]string

	// FragmentNames holds a repo => template string map, for
	// the line number fragment.
	LineFragments map[string]string
}

SearchResult contains search matches and extra data

func SearchResultFromProto

func SearchResultFromProto(p *proto.SearchResponse, repoURLs, lineFragments map[string]string) *SearchResult

func SearchResultFromStreamProto

func SearchResultFromStreamProto(p *proto.StreamSearchResponse, repoURLs, lineFragments map[string]string) *SearchResult

func (*SearchResult) SizeBytes

func (sr *SearchResult) SizeBytes() (sz uint64)

SizeBytes is a best-effort estimate of the size of SearchResult in memory. The estimate does not take alignment into account. The result is a lower bound on the actual size in memory.

func (*SearchResult) ToProto

func (sr *SearchResult) ToProto() *proto.SearchResponse

func (*SearchResult) ToStreamProto

func (sr *SearchResult) ToStreamProto() *proto.StreamSearchResponse

type Searcher

type Searcher interface {
	Search(ctx context.Context, q query.Q, opts *SearchOptions) (*SearchResult, error)

	// List lists repositories. The query `q` can only contain
	// query.Repo atoms.
	List(ctx context.Context, q query.Q, opts *ListOptions) (*RepoList, error)
	Close()

	// Describe the searcher for debug messages.
	String() string
}

func NewSearcher

func NewSearcher(r IndexFile) (Searcher, error)

NewSearcher creates a Searcher for a single index file. Search results coming from this searcher are valid only for the lifetime of the Searcher itself, ie. []byte members should be copied into fresh buffers if the result is to survive closing the shard.

type Sender

type Sender interface {
	Send(*SearchResult)
}

Sender is the interface that wraps the basic Send method.

type SenderFunc

type SenderFunc func(result *SearchResult)

SenderFunc is an adapter to allow the use of ordinary functions as Sender. If f is a function with the appropriate signature, SenderFunc(f) is a Sender that calls f.

func (SenderFunc) Send

func (f SenderFunc) Send(result *SearchResult)

type Stats

type Stats struct {
	// Amount of I/O for reading contents.
	ContentBytesLoaded int64

	// Amount of I/O for reading from index.
	IndexBytesLoaded int64

	// Number of search shards that had a crash.
	Crashes int

	// Wall clock time for this search
	Duration time.Duration

	// Number of files containing a match.
	FileCount int

	// Number of files in shards that we considered.
	ShardFilesConsidered int

	// Files that we evaluated. Equivalent to files for which all
	// atom matches (including negations) evaluated to true.
	FilesConsidered int

	// Files for which we loaded file content to verify substring matches
	FilesLoaded int

	// Candidate files whose contents weren't examined because we
	// gathered enough matches.
	FilesSkipped int

	// Shards that we scanned to find matches.
	ShardsScanned int

	// Shards that we did not process because a query was canceled.
	ShardsSkipped int

	// Shards that we did not process because the query was rejected by the
	// ngram filter indicating it had no matches.
	ShardsSkippedFilter int

	// Number of non-overlapping matches
	MatchCount int

	// Number of candidate matches as a result of searching ngrams.
	NgramMatches int

	// NgramLookups is the number of times we accessed an ngram in the index.
	NgramLookups int

	// Wall clock time for queued search.
	Wait time.Duration

	// Aggregate wall clock time spent constructing and pruning the match tree.
	// This accounts for time such as lookups in the trigram index.
	MatchTreeConstruction time.Duration

	// Aggregate wall clock time spent searching the match tree. This accounts
	// for the bulk of search work done looking for matches.
	MatchTreeSearch time.Duration

	// Number of times regexp was called on files that we evaluated.
	RegexpsConsidered int

	// FlushReason explains why results were flushed.
	FlushReason FlushReason
}

Stats contains interesting numbers on the search

func StatsFromProto

func StatsFromProto(p *proto.Stats) Stats

func (*Stats) Add

func (s *Stats) Add(o Stats)

func (*Stats) ToProto

func (s *Stats) ToProto() *proto.Stats

func (*Stats) Zero

func (s *Stats) Zero() bool

Zero returns true if stats is empty.

type Streamer

type Streamer interface {
	Searcher
	StreamSearch(ctx context.Context, q query.Q, opts *SearchOptions, sender Sender) (err error)
}

Streamer adds the method StreamSearch to the Searcher interface.

type Symbol

type Symbol struct {
	Sym        string
	Kind       string
	Parent     string
	ParentKind string
}

func SymbolFromProto

func SymbolFromProto(p *proto.SymbolInfo) *Symbol

func (*Symbol) ToProto

func (s *Symbol) ToProto() *proto.SymbolInfo

Directories

Path Synopsis
package build implements a more convenient interface for building zoekt indices.
package build implements a more convenient interface for building zoekt indices.
cmd
zoekt-archive-index
Command zoekt-archive-index indexes an archive.
Command zoekt-archive-index indexes an archive.
zoekt-git-clone
This binary fetches all repos of a user or organization and clones them.
This binary fetches all repos of a user or organization and clones them.
zoekt-mirror-bitbucket-server
This binary fetches all repos of a project, and of a specific type, in case these are specified, and clones them.
This binary fetches all repos of a project, and of a specific type, in case these are specified, and clones them.
zoekt-mirror-github
This binary fetches all repos of a user or organization and clones them.
This binary fetches all repos of a user or organization and clones them.
zoekt-mirror-gitiles
This binary fetches all repos of a Gitiles host.
This binary fetches all repos of a Gitiles host.
zoekt-mirror-gitlab
This binary fetches all repos for a user from gitlab.
This binary fetches all repos for a user from gitlab.
zoekt-repo-index
zoekt-repo-index indexes a repo-based repository.
zoekt-repo-index indexes a repo-based repository.
zoekt-sourcegraph-indexserver
Command zoekt-sourcegraph-indexserver periodically reindexes enabled repositories on sourcegraph
Command zoekt-sourcegraph-indexserver periodically reindexes enabled repositories on sourcegraph
zoekt-test
zoekt-test compares the search engine results with raw substring search
zoekt-test compares the search engine results with raw substring search
Package gitindex provides functions for indexing Git repositories.
Package gitindex provides functions for indexing Git repositories.
grpc
chunk
Package chunk provides a utility for sending sets of protobuf messages in groups of smaller chunks.
Package chunk provides a utility for sending sets of protobuf messages in groups of smaller chunks.
package ignore provides helpers to support ignore-files similar to .gitignore
package ignore provides helpers to support ignore-files similar to .gitignore
internal
archive
package archive provides indexing of archives from remote URLs.
package archive provides indexing of archives from remote URLs.
e2e
package e2e contains end to end tests
package e2e contains end to end tests
otlpenv
Package otlpenv exports getters to read OpenTelemetry protocol configuration options based on the official spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md#configuration-options
Package otlpenv exports getters to read OpenTelemetry protocol configuration options based on the official spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md#configuration-options
Package trace provides a tracing API that in turn invokes both the `golang.org/x/net/trace` API and creates an opentracing span if appropriate.
Package trace provides a tracing API that in turn invokes both the `golang.org/x/net/trace` API and creates an opentracing span if appropriate.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL