index

package

v2.7.0-nightly.20230525 Latest Latest Go to latest Published: May 24, 2023 License: Apache-2.0 Imports: 20 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/pachyderm/pachyderm

Documentation ¶

Overview ¶

Package index provides access to files through multilevel indexes.

A multilevel index contains one or more levels where the lowest level contains file type index entries which directly reference file content and the above levels contain range type index entries which directly reference a range of index entries and indirectly reference ranges of file content. Multilevel indexes are created using writers which provide functionality for creating indexes for new files or creating indexes based on other indexes (rooted by file or range type indexes). Reading a multilevel index requires setting up a reader which provide various indexing strategies and filters.

Index ¶

Constants
Variables
func Generate(s string) []string
func Merge(ctx context.Context, storage *chunk.Storage, indexes []*Index, ...) error
func Perm(a []rune, f func([]rune))
func PointsTo(idx *Index) []chunk.ID
func SizeBytes(idx *Index) int64
type Cache
- func NewCache(storage *chunk.Storage, size int) *Cache
- func (c *Cache) Get(ctx context.Context, chunkRef *chunk.DataRef, filter *pathFilter, w io.Writer) error
type File
- func (*File) Descriptor() ([]byte, []int)
- func (m *File) GetDataRefs() []*chunk.DataRef
- func (m *File) GetDatum() string
- func (m *File) Marshal() (dAtA []byte, err error)
- func (x *File) MarshalLogObject(enc zapcore.ObjectEncoder) error
- func (m *File) MarshalTo(dAtA []byte) (int, error)
- func (m *File) MarshalToSizedBuffer(dAtA []byte) (int, error)
- func (*File) ProtoMessage()
- func (m *File) Reset()
- func (m *File) Size() (n int)
- func (m *File) String() string
- func (m *File) Unmarshal(dAtA []byte) error
- func (m *File) XXX_DiscardUnknown()
- func (m *File) XXX_Marshal(b []byte, deterministic bool) ([]byte, error)
- func (m *File) XXX_Merge(src proto.Message)
- func (m *File) XXX_Size() int
- func (m *File) XXX_Unmarshal(b []byte) error
type Index
- func (*Index) Descriptor() ([]byte, []int)
- func (m *Index) GetFile() *File
- func (m *Index) GetNumFiles() int64
- func (m *Index) GetPath() string
- func (m *Index) GetRange() *Range
- func (m *Index) GetSizeBytes() int64
- func (m *Index) Marshal() (dAtA []byte, err error)
- func (x *Index) MarshalLogObject(enc zapcore.ObjectEncoder) error
- func (m *Index) MarshalTo(dAtA []byte) (int, error)
- func (m *Index) MarshalToSizedBuffer(dAtA []byte) (int, error)
- func (*Index) ProtoMessage()
- func (m *Index) Reset()
- func (m *Index) Size() (n int)
- func (m *Index) String() string
- func (m *Index) Unmarshal(dAtA []byte) error
- func (m *Index) XXX_DiscardUnknown()
- func (m *Index) XXX_Marshal(b []byte, deterministic bool) ([]byte, error)
- func (m *Index) XXX_Merge(src proto.Message)
- func (m *Index) XXX_Size() int
- func (m *Index) XXX_Unmarshal(b []byte) error
type Option
- func WithDatum(datum string) Option
- func WithPrefix(prefix string) Option
- func WithRange(pathRange *PathRange) Option
- func WithShardConfig(config *ShardConfig) Option
type PathRange
- func (r *PathRange) String() string
type Range
- func (*Range) Descriptor() ([]byte, []int)
- func (m *Range) GetChunkRef() *chunk.DataRef
- func (m *Range) GetLastPath() string
- func (m *Range) GetOffset() int64
- func (m *Range) Marshal() (dAtA []byte, err error)
- func (x *Range) MarshalLogObject(enc zapcore.ObjectEncoder) error
- func (m *Range) MarshalTo(dAtA []byte) (int, error)
- func (m *Range) MarshalToSizedBuffer(dAtA []byte) (int, error)
- func (*Range) ProtoMessage()
- func (m *Range) Reset()
- func (m *Range) Size() (n int)
- func (m *Range) String() string
- func (m *Range) Unmarshal(dAtA []byte) error
- func (m *Range) XXX_DiscardUnknown()
- func (m *Range) XXX_Marshal(b []byte, deterministic bool) ([]byte, error)
- func (m *Range) XXX_Merge(src proto.Message)
- func (m *Range) XXX_Size() int
- func (m *Range) XXX_Unmarshal(b []byte) error
type Reader
- func NewReader(chunks *chunk.Storage, cache *Cache, topIdx *Index, opts ...Option) *Reader
- func (r *Reader) Iterate(ctx context.Context, cb func(*Index) error) error
- func (r *Reader) Shards(ctx context.Context) ([]*PathRange, error)
type ShardConfig
type Writer
- func NewWriter(ctx context.Context, chunks *chunk.Storage, tmpID string) *Writer
- func (w *Writer) Close() (*Index, error)
- func (w *Writer) WriteIndex(idx *Index) error

Constants ¶

View Source

const (
	// DefaultShardNumThreshold is the default for the NumFiles threshold that must
	// be met before a shard is created.
	DefaultShardNumThreshold = 1000000
	// DefaultShardSizeThreshold is the default for the SizeBytes threshold that must
	// be met before a shard is created.
	DefaultShardSizeThreshold = units.GB
)

View Source

const (
	DefaultBatchThreshold = units.MB
)

Variables ¶

View Source

var (
	ErrInvalidLengthIndex        = fmt.Errorf("proto: negative length found during unmarshaling")
	ErrIntOverflowIndex          = fmt.Errorf("proto: integer overflow")
	ErrUnexpectedEndOfGroupIndex = fmt.Errorf("proto: unexpected end of group")
)

Functions ¶

func Generate ¶

func Generate(s string) []string

Generate generates the permutations of the passed in string and returns them sorted.

func Merge ¶

func Merge(ctx context.Context, storage *chunk.Storage, indexes []*Index, cb func(*Index) error) error

func Perm ¶

func Perm(a []rune, f func([]rune))

Perm calls f with each permutation of a.

func PointsTo ¶

func PointsTo(idx *Index) []chunk.ID

PointsTo returns a list of all the chunks this index references

func SizeBytes ¶

func SizeBytes(idx *Index) int64

SizeBytes computes the size of the indexed data in bytes.

Types ¶

type Cache ¶

type Cache struct {
	// contains filtered or unexported fields
}

func NewCache ¶

func NewCache(storage *chunk.Storage, size int) *Cache

func (*Cache) Get ¶

func (c *Cache) Get(ctx context.Context, chunkRef *chunk.DataRef, filter *pathFilter, w io.Writer) error

type File ¶

type File struct {
	Datum                string           `protobuf:"bytes,1,opt,name=datum,proto3" json:"datum,omitempty"`
	DataRefs             []*chunk.DataRef `protobuf:"bytes,2,rep,name=data_refs,json=dataRefs,proto3" json:"data_refs,omitempty"`
	XXX_NoUnkeyedLiteral struct{}         `json:"-"`
	XXX_unrecognized     []byte           `json:"-"`
	XXX_sizecache        int32            `json:"-"`
}

func (*File) Descriptor ¶

func (*File) Descriptor() ([]byte, []int)

func (*File) GetDataRefs ¶

func (m *File) GetDataRefs() []*chunk.DataRef

func (*File) GetDatum ¶

func (m *File) GetDatum() string

func (*File) Marshal ¶

func (m *File) Marshal() (dAtA []byte, err error)

func (*File) MarshalLogObject ¶

func (x *File) MarshalLogObject(enc zapcore.ObjectEncoder) error

func (*File) MarshalTo ¶

func (m *File) MarshalTo(dAtA []byte) (int, error)

func (*File) MarshalToSizedBuffer ¶

func (m *File) MarshalToSizedBuffer(dAtA []byte) (int, error)

func (*File) ProtoMessage ¶

func (*File) ProtoMessage()

func (*File) Reset ¶

func (m *File) Reset()

func (*File) Size ¶

func (m *File) Size() (n int)

func (*File) String ¶

func (m *File) String() string

func (*File) Unmarshal ¶

func (m *File) Unmarshal(dAtA []byte) error

func (*File) XXX_DiscardUnknown ¶

func (m *File) XXX_DiscardUnknown()

func (*File) XXX_Marshal ¶

func (m *File) XXX_Marshal(b []byte, deterministic bool) ([]byte, error)

func (*File) XXX_Merge ¶

func (m *File) XXX_Merge(src proto.Message)

func (*File) XXX_Size ¶

func (m *File) XXX_Size() int

func (*File) XXX_Unmarshal ¶

func (m *File) XXX_Unmarshal(b []byte) error

type Index ¶

type Index struct {
	Path string `protobuf:"bytes,1,opt,name=path,proto3" json:"path,omitempty"`
	// NOTE: range and file are mutually exclusive.
	Range *Range `protobuf:"bytes,2,opt,name=range,proto3" json:"range,omitempty"`
	File  *File  `protobuf:"bytes,3,opt,name=file,proto3" json:"file,omitempty"`
	// NOTE: num_files and size_bytes did not exist in older versions of 2.x, so
	// they will not be set.
	NumFiles             int64    `protobuf:"varint,4,opt,name=num_files,json=numFiles,proto3" json:"num_files,omitempty"`
	SizeBytes            int64    `protobuf:"varint,5,opt,name=size_bytes,json=sizeBytes,proto3" json:"size_bytes,omitempty"`
	XXX_NoUnkeyedLiteral struct{} `json:"-"`
	XXX_unrecognized     []byte   `json:"-"`
	XXX_sizecache        int32    `json:"-"`
}

Index stores an index to and metadata about a range of files or a file.

func (*Index) Descriptor ¶

func (*Index) Descriptor() ([]byte, []int)

func (*Index) GetFile ¶

func (m *Index) GetFile() *File

func (*Index) GetNumFiles ¶

func (m *Index) GetNumFiles() int64

func (*Index) GetPath ¶

func (m *Index) GetPath() string

func (*Index) GetRange ¶

func (m *Index) GetRange() *Range

func (*Index) GetSizeBytes ¶

func (m *Index) GetSizeBytes() int64

func (*Index) Marshal ¶

func (m *Index) Marshal() (dAtA []byte, err error)

func (*Index) MarshalLogObject ¶

func (x *Index) MarshalLogObject(enc zapcore.ObjectEncoder) error

func (*Index) MarshalTo ¶

func (m *Index) MarshalTo(dAtA []byte) (int, error)

func (*Index) MarshalToSizedBuffer ¶

func (m *Index) MarshalToSizedBuffer(dAtA []byte) (int, error)

func (*Index) ProtoMessage ¶

func (*Index) ProtoMessage()

func (*Index) Reset ¶

func (m *Index) Reset()

func (*Index) Size ¶

func (m *Index) Size() (n int)

func (*Index) String ¶

func (m *Index) String() string

func (*Index) Unmarshal ¶

func (m *Index) Unmarshal(dAtA []byte) error

func (*Index) XXX_DiscardUnknown ¶

func (m *Index) XXX_DiscardUnknown()

func (*Index) XXX_Marshal ¶

func (m *Index) XXX_Marshal(b []byte, deterministic bool) ([]byte, error)

func (*Index) XXX_Merge ¶

func (m *Index) XXX_Merge(src proto.Message)

func (*Index) XXX_Size ¶

func (m *Index) XXX_Size() int

func (*Index) XXX_Unmarshal ¶

func (m *Index) XXX_Unmarshal(b []byte) error

type Option ¶

type Option func(r *Reader)

Option configures an index reader.

func WithDatum ¶

func WithDatum(datum string) Option

WithDatum adds a datum filter that matches a single datum.

func WithPrefix ¶

func WithPrefix(prefix string) Option

WithPrefix sets a prefix filter for the read.

func WithRange ¶

func WithRange(pathRange *PathRange) Option

WithRange sets a range filter for the read.

func WithShardConfig ¶

func WithShardConfig(config *ShardConfig) Option

WithShardConfig sets the sharding configuration.

type PathRange ¶

type PathRange struct {
	Lower, Upper string
}

PathRange is a range of paths. The range is inclusive, exclusive: [Lower, Upper).

func (*PathRange) String ¶

func (r *PathRange) String() string

type Range ¶

type Range struct {
	Offset               int64          `protobuf:"varint,1,opt,name=offset,proto3" json:"offset,omitempty"`
	LastPath             string         `protobuf:"bytes,2,opt,name=last_path,json=lastPath,proto3" json:"last_path,omitempty"`
	ChunkRef             *chunk.DataRef `protobuf:"bytes,3,opt,name=chunk_ref,json=chunkRef,proto3" json:"chunk_ref,omitempty"`
	XXX_NoUnkeyedLiteral struct{}       `json:"-"`
	XXX_unrecognized     []byte         `json:"-"`
	XXX_sizecache        int32          `json:"-"`
}

func (*Range) Descriptor ¶

func (*Range) Descriptor() ([]byte, []int)

func (*Range) GetChunkRef ¶

func (m *Range) GetChunkRef() *chunk.DataRef

func (*Range) GetLastPath ¶

func (m *Range) GetLastPath() string

func (*Range) GetOffset ¶

func (m *Range) GetOffset() int64

func (*Range) Marshal ¶

func (m *Range) Marshal() (dAtA []byte, err error)

func (*Range) MarshalLogObject ¶

func (x *Range) MarshalLogObject(enc zapcore.ObjectEncoder) error

func (*Range) MarshalTo ¶

func (m *Range) MarshalTo(dAtA []byte) (int, error)

func (*Range) MarshalToSizedBuffer ¶

func (m *Range) MarshalToSizedBuffer(dAtA []byte) (int, error)

func (*Range) ProtoMessage ¶

func (*Range) ProtoMessage()

func (*Range) Reset ¶

func (m *Range) Reset()

func (*Range) Size ¶

func (m *Range) Size() (n int)

func (*Range) String ¶

func (m *Range) String() string

func (*Range) Unmarshal ¶

func (m *Range) Unmarshal(dAtA []byte) error

func (*Range) XXX_DiscardUnknown ¶

func (m *Range) XXX_DiscardUnknown()

func (*Range) XXX_Marshal ¶

func (m *Range) XXX_Marshal(b []byte, deterministic bool) ([]byte, error)

func (*Range) XXX_Merge ¶

func (m *Range) XXX_Merge(src proto.Message)

func (*Range) XXX_Size ¶

func (m *Range) XXX_Size() int

func (*Range) XXX_Unmarshal ¶

func (m *Range) XXX_Unmarshal(b []byte) error

type Reader ¶

type Reader struct {
	// contains filtered or unexported fields
}

Reader is used for reading a multilevel index.

func NewReader ¶

func NewReader(chunks *chunk.Storage, cache *Cache, topIdx *Index, opts ...Option) *Reader

NewReader creates a new Reader.

func (*Reader) Iterate ¶

func (r *Reader) Iterate(ctx context.Context, cb func(*Index) error) error

Iterate iterates over the lowest level (file type) indexes.

func (*Reader) Shards ¶

func (r *Reader) Shards(ctx context.Context) ([]*PathRange, error)

Shards creates shards for the index based on the sharding configuration provided to the reader. Sharding takes advantage of the NumFiles and SizeBytes index metadata to efficiently traverse the multilevel index. A subtree is traversed only when a split point exists within it, which we know based on the NumFiles and SizeBytes values at the root of each subtree.

type ShardConfig ¶

type ShardConfig struct {
	NumFiles  int64
	SizeBytes int64
}

ShardConfig is a sharding configuration. NumFiles is the number of files to target for each shard. SizeBytes is the size, in bytes, to target for each shard.

type Writer ¶

type Writer struct {
	// contains filtered or unexported fields
}

Writer is used for creating a multilevel index into a serialized file set. Each index level is a stream of byte length encoded index entries that are stored in chunk storage. Both file and range type indexes can be written to a writer. New levels above the written indexes will be created when the serialized indexes reach the batching threshold.

func NewWriter ¶

func NewWriter(ctx context.Context, chunks *chunk.Storage, tmpID string) *Writer

NewWriter create a new Writer.

func (*Writer) Close ¶

func (w *Writer) Close() (*Index, error)

Close finishes the index, and returns the serialized top index level.

func (*Writer) WriteIndex ¶

func (w *Writer) WriteIndex(idx *Index) error

WriteIndex writes an index entry.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL