index

package
v2.7.0-nightly.20230525 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 24, 2023 License: Apache-2.0 Imports: 20 Imported by: 0

Documentation

Overview

Package index provides access to files through multilevel indexes.

A multilevel index contains one or more levels where the lowest level contains file type index entries which directly reference file content and the above levels contain range type index entries which directly reference a range of index entries and indirectly reference ranges of file content. Multilevel indexes are created using writers which provide functionality for creating indexes for new files or creating indexes based on other indexes (rooted by file or range type indexes). Reading a multilevel index requires setting up a reader which provide various indexing strategies and filters.

Index

Constants

View Source
const (
	// DefaultShardNumThreshold is the default for the NumFiles threshold that must
	// be met before a shard is created.
	DefaultShardNumThreshold = 1000000
	// DefaultShardSizeThreshold is the default for the SizeBytes threshold that must
	// be met before a shard is created.
	DefaultShardSizeThreshold = units.GB
)
View Source
const (
	DefaultBatchThreshold = units.MB
)

Variables

View Source
var (
	ErrInvalidLengthIndex        = fmt.Errorf("proto: negative length found during unmarshaling")
	ErrIntOverflowIndex          = fmt.Errorf("proto: integer overflow")
	ErrUnexpectedEndOfGroupIndex = fmt.Errorf("proto: unexpected end of group")
)

Functions

func Generate

func Generate(s string) []string

Generate generates the permutations of the passed in string and returns them sorted.

func Merge

func Merge(ctx context.Context, storage *chunk.Storage, indexes []*Index, cb func(*Index) error) error

func Perm

func Perm(a []rune, f func([]rune))

Perm calls f with each permutation of a.

func PointsTo

func PointsTo(idx *Index) []chunk.ID

PointsTo returns a list of all the chunks this index references

func SizeBytes

func SizeBytes(idx *Index) int64

SizeBytes computes the size of the indexed data in bytes.

Types

type Cache

type Cache struct {
	// contains filtered or unexported fields
}

func NewCache

func NewCache(storage *chunk.Storage, size int) *Cache

func (*Cache) Get

func (c *Cache) Get(ctx context.Context, chunkRef *chunk.DataRef, filter *pathFilter, w io.Writer) error

type File

type File struct {
	Datum                string           `protobuf:"bytes,1,opt,name=datum,proto3" json:"datum,omitempty"`
	DataRefs             []*chunk.DataRef `protobuf:"bytes,2,rep,name=data_refs,json=dataRefs,proto3" json:"data_refs,omitempty"`
	XXX_NoUnkeyedLiteral struct{}         `json:"-"`
	XXX_unrecognized     []byte           `json:"-"`
	XXX_sizecache        int32            `json:"-"`
}

func (*File) Descriptor

func (*File) Descriptor() ([]byte, []int)

func (*File) GetDataRefs

func (m *File) GetDataRefs() []*chunk.DataRef

func (*File) GetDatum

func (m *File) GetDatum() string

func (*File) Marshal

func (m *File) Marshal() (dAtA []byte, err error)

func (*File) MarshalLogObject

func (x *File) MarshalLogObject(enc zapcore.ObjectEncoder) error

func (*File) MarshalTo

func (m *File) MarshalTo(dAtA []byte) (int, error)

func (*File) MarshalToSizedBuffer

func (m *File) MarshalToSizedBuffer(dAtA []byte) (int, error)

func (*File) ProtoMessage

func (*File) ProtoMessage()

func (*File) Reset

func (m *File) Reset()

func (*File) Size

func (m *File) Size() (n int)

func (*File) String

func (m *File) String() string

func (*File) Unmarshal

func (m *File) Unmarshal(dAtA []byte) error

func (*File) XXX_DiscardUnknown

func (m *File) XXX_DiscardUnknown()

func (*File) XXX_Marshal

func (m *File) XXX_Marshal(b []byte, deterministic bool) ([]byte, error)

func (*File) XXX_Merge

func (m *File) XXX_Merge(src proto.Message)

func (*File) XXX_Size

func (m *File) XXX_Size() int

func (*File) XXX_Unmarshal

func (m *File) XXX_Unmarshal(b []byte) error

type Index

type Index struct {
	Path string `protobuf:"bytes,1,opt,name=path,proto3" json:"path,omitempty"`
	// NOTE: range and file are mutually exclusive.
	Range *Range `protobuf:"bytes,2,opt,name=range,proto3" json:"range,omitempty"`
	File  *File  `protobuf:"bytes,3,opt,name=file,proto3" json:"file,omitempty"`
	// NOTE: num_files and size_bytes did not exist in older versions of 2.x, so
	// they will not be set.
	NumFiles             int64    `protobuf:"varint,4,opt,name=num_files,json=numFiles,proto3" json:"num_files,omitempty"`
	SizeBytes            int64    `protobuf:"varint,5,opt,name=size_bytes,json=sizeBytes,proto3" json:"size_bytes,omitempty"`
	XXX_NoUnkeyedLiteral struct{} `json:"-"`
	XXX_unrecognized     []byte   `json:"-"`
	XXX_sizecache        int32    `json:"-"`
}

Index stores an index to and metadata about a range of files or a file.

func (*Index) Descriptor

func (*Index) Descriptor() ([]byte, []int)

func (*Index) GetFile

func (m *Index) GetFile() *File

func (*Index) GetNumFiles

func (m *Index) GetNumFiles() int64

func (*Index) GetPath

func (m *Index) GetPath() string

func (*Index) GetRange

func (m *Index) GetRange() *Range

func (*Index) GetSizeBytes

func (m *Index) GetSizeBytes() int64

func (*Index) Marshal

func (m *Index) Marshal() (dAtA []byte, err error)

func (*Index) MarshalLogObject

func (x *Index) MarshalLogObject(enc zapcore.ObjectEncoder) error

func (*Index) MarshalTo

func (m *Index) MarshalTo(dAtA []byte) (int, error)

func (*Index) MarshalToSizedBuffer

func (m *Index) MarshalToSizedBuffer(dAtA []byte) (int, error)

func (*Index) ProtoMessage

func (*Index) ProtoMessage()

func (*Index) Reset

func (m *Index) Reset()

func (*Index) Size

func (m *Index) Size() (n int)

func (*Index) String

func (m *Index) String() string

func (*Index) Unmarshal

func (m *Index) Unmarshal(dAtA []byte) error

func (*Index) XXX_DiscardUnknown

func (m *Index) XXX_DiscardUnknown()

func (*Index) XXX_Marshal

func (m *Index) XXX_Marshal(b []byte, deterministic bool) ([]byte, error)

func (*Index) XXX_Merge

func (m *Index) XXX_Merge(src proto.Message)

func (*Index) XXX_Size

func (m *Index) XXX_Size() int

func (*Index) XXX_Unmarshal

func (m *Index) XXX_Unmarshal(b []byte) error

type Option

type Option func(r *Reader)

Option configures an index reader.

func WithDatum

func WithDatum(datum string) Option

WithDatum adds a datum filter that matches a single datum.

func WithPrefix

func WithPrefix(prefix string) Option

WithPrefix sets a prefix filter for the read.

func WithRange

func WithRange(pathRange *PathRange) Option

WithRange sets a range filter for the read.

func WithShardConfig

func WithShardConfig(config *ShardConfig) Option

WithShardConfig sets the sharding configuration.

type PathRange

type PathRange struct {
	Lower, Upper string
}

PathRange is a range of paths. The range is inclusive, exclusive: [Lower, Upper).

func (*PathRange) String

func (r *PathRange) String() string

type Range

type Range struct {
	Offset               int64          `protobuf:"varint,1,opt,name=offset,proto3" json:"offset,omitempty"`
	LastPath             string         `protobuf:"bytes,2,opt,name=last_path,json=lastPath,proto3" json:"last_path,omitempty"`
	ChunkRef             *chunk.DataRef `protobuf:"bytes,3,opt,name=chunk_ref,json=chunkRef,proto3" json:"chunk_ref,omitempty"`
	XXX_NoUnkeyedLiteral struct{}       `json:"-"`
	XXX_unrecognized     []byte         `json:"-"`
	XXX_sizecache        int32          `json:"-"`
}

func (*Range) Descriptor

func (*Range) Descriptor() ([]byte, []int)

func (*Range) GetChunkRef

func (m *Range) GetChunkRef() *chunk.DataRef

func (*Range) GetLastPath

func (m *Range) GetLastPath() string

func (*Range) GetOffset

func (m *Range) GetOffset() int64

func (*Range) Marshal

func (m *Range) Marshal() (dAtA []byte, err error)

func (*Range) MarshalLogObject

func (x *Range) MarshalLogObject(enc zapcore.ObjectEncoder) error

func (*Range) MarshalTo

func (m *Range) MarshalTo(dAtA []byte) (int, error)

func (*Range) MarshalToSizedBuffer

func (m *Range) MarshalToSizedBuffer(dAtA []byte) (int, error)

func (*Range) ProtoMessage

func (*Range) ProtoMessage()

func (*Range) Reset

func (m *Range) Reset()

func (*Range) Size

func (m *Range) Size() (n int)

func (*Range) String

func (m *Range) String() string

func (*Range) Unmarshal

func (m *Range) Unmarshal(dAtA []byte) error

func (*Range) XXX_DiscardUnknown

func (m *Range) XXX_DiscardUnknown()

func (*Range) XXX_Marshal

func (m *Range) XXX_Marshal(b []byte, deterministic bool) ([]byte, error)

func (*Range) XXX_Merge

func (m *Range) XXX_Merge(src proto.Message)

func (*Range) XXX_Size

func (m *Range) XXX_Size() int

func (*Range) XXX_Unmarshal

func (m *Range) XXX_Unmarshal(b []byte) error

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

Reader is used for reading a multilevel index.

func NewReader

func NewReader(chunks *chunk.Storage, cache *Cache, topIdx *Index, opts ...Option) *Reader

NewReader creates a new Reader.

func (*Reader) Iterate

func (r *Reader) Iterate(ctx context.Context, cb func(*Index) error) error

Iterate iterates over the lowest level (file type) indexes.

func (*Reader) Shards

func (r *Reader) Shards(ctx context.Context) ([]*PathRange, error)

Shards creates shards for the index based on the sharding configuration provided to the reader. Sharding takes advantage of the NumFiles and SizeBytes index metadata to efficiently traverse the multilevel index. A subtree is traversed only when a split point exists within it, which we know based on the NumFiles and SizeBytes values at the root of each subtree.

type ShardConfig

type ShardConfig struct {
	NumFiles  int64
	SizeBytes int64
}

ShardConfig is a sharding configuration. NumFiles is the number of files to target for each shard. SizeBytes is the size, in bytes, to target for each shard.

type Writer

type Writer struct {
	// contains filtered or unexported fields
}

Writer is used for creating a multilevel index into a serialized file set. Each index level is a stream of byte length encoded index entries that are stored in chunk storage. Both file and range type indexes can be written to a writer. New levels above the written indexes will be created when the serialized indexes reach the batching threshold.

func NewWriter

func NewWriter(ctx context.Context, chunks *chunk.Storage, tmpID string) *Writer

NewWriter create a new Writer.

func (*Writer) Close

func (w *Writer) Close() (*Index, error)

Close finishes the index, and returns the serialized top index level.

func (*Writer) WriteIndex

func (w *Writer) WriteIndex(idx *Index) error

WriteIndex writes an index entry.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL