Documentation
¶
Index ¶
- type DiskSorter
- func (d *DiskSorter) Close() error
- func (d *DiskSorter) CloseAndCleanup() error
- func (d *DiskSorter) IsSorted() bool
- func (d *DiskSorter) NewIterator(_ context.Context) (Iterator, error)
- func (d *DiskSorter) NewWriter(_ context.Context) (Writer, error)
- func (d *DiskSorter) Sort(ctx context.Context) error
- type DiskSorterOptions
- type ExternalSorter
- type Iterator
- type Writer
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type DiskSorter ¶
type DiskSorter struct {
// contains filtered or unexported fields
}
DiskSorter is an external sorter that sorts data on disk.
func OpenDiskSorter ¶
func OpenDiskSorter(dirname string, opts *DiskSorterOptions) (*DiskSorter, error)
OpenDiskSorter opens a DiskSorter with the given directory.
func (*DiskSorter) Close ¶
func (d *DiskSorter) Close() error
Close implements the ExternalSorter.Close.
func (*DiskSorter) CloseAndCleanup ¶
func (d *DiskSorter) CloseAndCleanup() error
CloseAndCleanup implements the ExternalSorter.CloseAndCleanup.
func (*DiskSorter) IsSorted ¶
func (d *DiskSorter) IsSorted() bool
IsSorted implements the ExternalSorter.IsSorted.
func (*DiskSorter) NewIterator ¶
func (d *DiskSorter) NewIterator(_ context.Context) (Iterator, error)
NewIterator implements the ExternalSorter.NewIterator.
type DiskSorterOptions ¶
type DiskSorterOptions struct {
// Concurrency is the maximum number of goroutines that can be used to
// sort data in parallel.
//
// The default value is runtime.GOMAXPROCS(0).
Concurrency int
// WriterBufferSize is the size of the buffer used by the writer.
// Larger buffer size can improve the write and sort performance,
// and reduce the number of disk operations.
//
// The default value is 128MB.
WriterBufferSize int
// CompactionThreshold is maximum overlap depth necessary to trigger a
// compaction. The overlap depth is the number of files that overlap at
// same interval.
//
// For example, consider the following files:
//
// file 0: a-----d
// file 1: b-----e
// file 2: c-------g
// file 3: d---f
//
// The overlap depth of these files is 3, because file 0, 1, 2 overlap at
// the interval [c, d), and file 1, 2, 3 overlap at the interval [d, e).
//
// If the overlap depth reached CompactionThreshold, the sorter will compact
// files to reduce the overlap depth during sorting. The larger the overlap
// depth, the larger read amplification will be during iteration. This is a
// trade-off between read amplification and sorting cost. Setting this value
// to math.MaxInt will disable the compaction.
//
// The default value is 16.
CompactionThreshold int
// MaxCompactionDepth is the maximum files involved in a single compaction.
// The minimum value is 2. Any value less than 2 will be treated as not set.
//
// The default value is 64.
MaxCompactionDepth int
// MaxCompactionSize is the maximum size of key-value pairs involved in a
// single compaction.
//
// The default value is 512MB.
MaxCompactionSize int
// Logger is used to write log messages.
//
// The default value is log.L().
Logger *zap.Logger
}
DiskSorterOptions holds the optional parameters for DiskSorter.
type ExternalSorter ¶
type ExternalSorter interface {
// NewWriter creates a new writer for writing key-value pairs before sorting.
// If the sorter starts sorting or is already sorted, it will return an error.
//
// Multiple writers can be opened and used concurrently.
NewWriter(ctx context.Context) (Writer, error)
// Sort sorts the key-value pairs written by the writer.
// It should be called after all open writers are closed.
//
// Implementations should guarantee that Sort() is idempotent and atomic.
// If it returns an error, or process is killed during Sort(), the sorter should be able
// to recover from the error or crash, and the external storage should not be corrupted.
Sort(ctx context.Context) error
// IsSorted returns true if the sorter is already sorted, iterators are ready to create.
IsSorted() bool
// NewIterator creates a new iterator for iterating over the key-value pairs after sorting.
// If the sorter is not sorted yet, it will return an error.
//
// Multiple iterators can be opened and used concurrently.
NewIterator(ctx context.Context) (Iterator, error)
// Close releases all resources held by the sorter. It will not clean up the external storage,
// so the sorter can recover from a crash.
Close() error
// CloseAndCleanup closes the external sorter and cleans up all resources created by the sorter.
CloseAndCleanup() error
}
ExternalSorter is an interface for sorting key-value pairs in external storage. The key-value pairs are sorted by the key, duplicate keys are automatically removed.
type Iterator ¶
type Iterator interface {
// Seek moves the iterator to the first key-value pair whose key is greater
// than or equal to the given key.
Seek(key []byte) bool
// First moves the iterator to the first key-value pair.
First() bool
// Next moves the iterator to the next key-value pair.
//
// Implementations must guarantee the next key is greater than the current key.
Next() bool
// Last moves the iterator to the last key-value pair.
Last() bool
// Valid returns true if the iterator is positioned at a valid key-value pair.
Valid() bool
// Error returns the error, if any, that was encountered during iteration.
Error() error
// UnsafeKey returns the key of the current key-value pair, without copying.
// The memory is only valid until the next call to change the iterator.
UnsafeKey() []byte
// UnsafeValue returns the value of the current key-value pair, without copying.
// The memory is only valid until the next call to change the iterator.
UnsafeValue() []byte
// Close releases all resources held by the iterator.
Close() error
}
Iterator is an interface for iterating over the key-value pairs after sorting.
type Writer ¶
type Writer interface {
// Put adds a key-value pair to the writer.
//
// Implementations must not modify or retain slices passed to Put().
Put(key, value []byte) error
// Flush flushes all buffered key-value pairs to the external sorter.
// the writer can be reused after calling Flush().
Flush() error
// Close flushes all buffered key-value pairs to the external sorter,
// and releases all resources held by the writer.
Close() error
}
Writer is an interface for writing key-value pairs to the external sorter.