extract

package
v1.3.19 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 6, 2023 License: MIT Imports: 26 Imported by: 0

Documentation

Overview

Package extract provides provides functions for working with compressed files

  • Copyright (c) 2018-2021, NVIDIA CORPORATION. All rights reserved.

Package extract provides provides functions for working with compressed files

  • Copyright (c) 2018-2021, NVIDIA CORPORATION. All rights reserved.

Package extract provides provides functions for working with compressed files

  • Copyright (c) 2018-2021, NVIDIA CORPORATION. All rights reserved.

Package extract provides provides functions for working with compressed files

  • Copyright (c) 2018-2021, NVIDIA CORPORATION. All rights reserved.

msgp -file <path to dsort/extract/record_gen.go> -tests=false -marshal=false -unexported Code generated by the command above; see docs/msgp.md. DO NOT EDIT.

Package extract provides provides functions for working with compressed files

  • Copyright (c) 2018-2021, NVIDIA CORPORATION. All rights reserved.

msgp -file <path to dsort/extract/shard.go> -tests=false -marshal=false -unexported Code generated by the command above; see docs/msgp.md. DO NOT EDIT.

Package extract provides provides functions for working with compressed files

  • Copyright (c) 2018-2023, NVIDIA CORPORATION. All rights reserved.

Package extract provides provides functions for working with compressed files

  • Copyright (c) 2018-2021, NVIDIA CORPORATION. All rights reserved.

Package extract provides provides functions for working with compressed files

  • Copyright (c) 2018-2021, NVIDIA CORPORATION. All rights reserved.

Index

Constants

View Source
const (
	FormatTypeInt    = "int"
	FormatTypeFloat  = "float"
	FormatTypeString = "string"
)
View Source
const (
	// Extract methods
	ExtractToMem cos.Bits = 1 << iota
	ExtractToDisk
	ExtractToWriter
)
View Source
const (
	// Values are small to save memory.
	OffsetStoreType = "o"
	SGLStoreType    = "s"
	DiskStoreType   = "d"
)

Variables

This section is empty.

Functions

func Ext

func Ext(path string) string

Ext returns the file name extension used by path. The extension is the suffix beginning at the FIRST (not final) dot in the final element of path; it is empty if there is no dot.

NOTE: This function one should be used instead of `filepath.Ext` in dSort.

func ValidateAlgorithmFormatType

func ValidateAlgorithmFormatType(ty string) error

Types

type Creator

type Creator interface {
	ExtractShard(lom *cluster.LOM, r cos.ReadReaderAt, extractor RecordExtractor, toDisk bool) (int64, int, error)
	CreateShard(s *Shard, w io.Writer, loadContent LoadContentFunc) (int64, error)
	UsingCompression() bool
	SupportsOffset() bool
	MetadataSize() int64
}

Creator is interface which describes set of functions which each shard creator should implement.

func NewTarExtractCreator

func NewTarExtractCreator(t cluster.Target) Creator

func NewTargzExtractCreator

func NewTargzExtractCreator(t cluster.Target) Creator

func NewZipExtractCreator

func NewZipExtractCreator(t cluster.Target) Creator

func NopExtractCreator

func NopExtractCreator(internal Creator) Creator

type KeyExtractor

type KeyExtractor interface {
	PrepareExtractor(name string, r cos.ReadSizer, ext string) (cos.ReadSizer, *SingleKeyExtractor, bool)

	// ExtractKey extracts key from either name or reader (file/sgl)
	ExtractKey(ske *SingleKeyExtractor) (any, error)
}

func NewContentKeyExtractor

func NewContentKeyExtractor(ty, ext string) (KeyExtractor, error)

func NewMD5KeyExtractor

func NewMD5KeyExtractor() (KeyExtractor, error)

func NewNameKeyExtractor

func NewNameKeyExtractor() (KeyExtractor, error)

type LoadContentFunc

type LoadContentFunc func(w io.Writer, rec *Record, obj *RecordObj) (int64, error)

LoadContentFunc is type for the function which loads content from the either remote or local target.

type Record

type Record struct {
	Key      any    `msg:"k" json:"k"` // Used to determine the sorting order.
	Name     string `msg:"n" json:"n"` // Name which uniquely identifies record across all shards.
	DaemonID string `msg:"d" json:"d"` // ID of the target which maintains the contents for this record.
	// All objects associated with given record. Record can be composed of
	// multiple objects which have the same name but different extension.
	Objects []*RecordObj `msg:"o" json:"o"`
}

Record represents the metadata corresponding to a single file from an archive file.

func (*Record) DecodeMsg

func (z *Record) DecodeMsg(dc *msgp.Reader) (err error)

DecodeMsg implements msgp.Decodable

func (*Record) EncodeMsg

func (z *Record) EncodeMsg(en *msgp.Writer) (err error)

EncodeMsg implements msgp.Encodable

func (*Record) MakeUniqueName

func (r *Record) MakeUniqueName(obj *RecordObj) string

func (*Record) Msgsize

func (z *Record) Msgsize() (s int)

Msgsize returns an upper bound estimate of the number of bytes occupied by the serialized message

func (*Record) TotalSize

func (r *Record) TotalSize() int64

type RecordExtractor

type RecordExtractor interface {
	ExtractRecordWithBuffer(args extractRecordArgs) (int64, error)
}

type RecordManager

type RecordManager struct {
	Records *Records
	// contains filtered or unexported fields
}

func NewRecordManager

func NewRecordManager(t cluster.Target, bck cmn.Bck, extension string, extractCreator Creator,
	keyExtractor KeyExtractor, onDuplicatedRecords func(string) error) *RecordManager

func (*RecordManager) ChangeStoreType

func (rm *RecordManager) ChangeStoreType(fullContentPath, newStoreType string, value any, buf []byte) (n int64)

func (*RecordManager) Cleanup

func (rm *RecordManager) Cleanup()

func (*RecordManager) EnqueueRecords

func (rm *RecordManager) EnqueueRecords(records *Records)

func (*RecordManager) ExtractRecordWithBuffer

func (rm *RecordManager) ExtractRecordWithBuffer(args extractRecordArgs) (size int64, err error)

func (*RecordManager) ExtractionPaths

func (rm *RecordManager) ExtractionPaths() *sync.Map

func (*RecordManager) FullContentPath

func (rm *RecordManager) FullContentPath(obj *RecordObj) string

func (*RecordManager) MergeEnqueuedRecords

func (rm *RecordManager) MergeEnqueuedRecords()

func (*RecordManager) RecordContents

func (rm *RecordManager) RecordContents() *sync.Map

type RecordObj

type RecordObj struct {
	// Can represent, one of the following:
	//  * Shard name - in case offset is used.
	//  * Key for extractCreator's RecordContents - records stored in SGLs.
	//  * Location (full path) on disk where extracted record has been placed.
	//
	// To get path for given object you need to use `FullContentPath` method.
	ContentPath string `msg:"p"  json:"p"`

	// Filesystem file type where the shard is stored - used to determine
	// location for content path when asking filesystem.
	ObjectFileType string `msg:"ft" json:"ft"`

	// Determines where the record has been stored, can be either: OffsetStoreType,
	// SGLStoreType, DiskStoreType.
	StoreType string `msg:"st" json:"st"`

	// If set, determines the offset in shard file where the record begins.
	Offset       int64  `msg:"f,omitempty" json:"f,string,omitempty"`
	MetadataSize int64  `msg:"ms" json:"ms,string"`
	Size         int64  `msg:"s" json:"s,string"`
	Extension    string `msg:"e" json:"e"`
}

RecordObj describes single object of record. Objects inside single record differs by extension.

func (*RecordObj) DecodeMsg

func (z *RecordObj) DecodeMsg(dc *msgp.Reader) (err error)

DecodeMsg implements msgp.Decodable

func (*RecordObj) EncodeMsg

func (z *RecordObj) EncodeMsg(en *msgp.Writer) (err error)

EncodeMsg implements msgp.Encodable

func (*RecordObj) Msgsize

func (z *RecordObj) Msgsize() (s int)

Msgsize returns an upper bound estimate of the number of bytes occupied by the serialized message

type Records

type Records struct {
	sync.RWMutex `msg:"-"`
	// contains filtered or unexported fields
}

Records abstract array of records. It safe to be used concurrently.

func NewRecords

func NewRecords(n int) *Records

NewRecords creates new instance of Records struct and allocates n places for the actual Record's

func (*Records) All

func (r *Records) All() []*Record

func (*Records) DecodeMsg

func (z *Records) DecodeMsg(dc *msgp.Reader) (err error)

DecodeMsg implements msgp.Decodable

func (*Records) DeleteDup

func (r *Records) DeleteDup(name, ext string)

func (*Records) Drain

func (r *Records) Drain()

func (*Records) EncodeMsg

func (z *Records) EncodeMsg(en *msgp.Writer) (err error)

EncodeMsg implements msgp.Encodable

func (*Records) Exists

func (r *Records) Exists(name, ext string) (exists bool)

func (*Records) Find

func (r *Records) Find(name string) (record *Record, exists bool)

NOTE: must be done under lock

func (*Records) Insert

func (r *Records) Insert(records ...*Record)

func (*Records) Len

func (r *Records) Len() int

func (*Records) Less

func (r *Records) Less(i, j int, formatType string) (bool, error)

func (*Records) MarshalJSON

func (*Records) MarshalJSON() ([]byte, error)

func (*Records) Msgsize

func (z *Records) Msgsize() (s int)

Msgsize returns an upper bound estimate of the number of bytes occupied by the serialized message

func (*Records) RecordMemorySize

func (r *Records) RecordMemorySize() (size uint64)

func (*Records) Slice

func (r *Records) Slice(start, end int) *Records

func (*Records) Swap

func (r *Records) Swap(i, j int)

func (*Records) TotalObjectCount

func (r *Records) TotalObjectCount() int

func (*Records) UnmarshalJSON

func (*Records) UnmarshalJSON([]byte) error

type Shard

type Shard struct {
	// Size is total size of shard to be created.
	Size int64 `msg:"s"`
	// Records contains all metadata to construct the shard.
	Records *Records `msg:"r"`
	// Name determines the output name of the shard.
	Name string `msg:"n"`
}

Shard represents the metadata required to construct a single shard (aka an archive file).

func (*Shard) DecodeMsg

func (z *Shard) DecodeMsg(dc *msgp.Reader) (err error)

DecodeMsg implements msgp.Decodable

func (*Shard) EncodeMsg

func (z *Shard) EncodeMsg(en *msgp.Writer) (err error)

EncodeMsg implements msgp.Encodable

func (*Shard) MarshalJSON

func (*Shard) MarshalJSON() ([]byte, error)

func (*Shard) Msgsize

func (z *Shard) Msgsize() (s int)

Msgsize returns an upper bound estimate of the number of bytes occupied by the serialized message

func (*Shard) UnmarshalJSON

func (*Shard) UnmarshalJSON([]byte) error

type SingleKeyExtractor

type SingleKeyExtractor struct {
	// contains filtered or unexported fields
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL