docs

package

v1.5.0 Latest Latest Go to latest Published: Apr 7, 2022 License: Apache-2.0 Imports: 8 Imported by: 4

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/m3db/m3

Links

Open Source Insights

README ¶

Documents

Two files are used to represent the documents in a segment. The data file contains the data for each document in the segment. The index file contains, for each document, its corresponding offset in the data file.

Data File

The data file contains the fields for each document. The documents are stored serially.

┌───────────────────────────┐
│ ┌───────────────────────┐ │
│ │      Document 1       │ │
│ ├───────────────────────┤ │
│ │          ...          │ │
│ ├───────────────────────┤ │
│ │      Document n       │ │
│ └───────────────────────┘ │
└───────────────────────────┘

Document

Each document is composed of an ID and its fields. The ID is a sequence of valid UTF-8 bytes and it is encoded first by encoding the length of the ID, in bytes, as a variable-sized unsigned integer and then encoding the actual bytes which comprise the ID. Following the ID are the fields. The number of fields in the document is encoded first as a variable-sized unsigned integer and then the fields themselves are encoded.

┌───────────────────────────┐
│ ┌───────────────────────┐ │
│ │     Length of ID      │ │
│ │       (uvarint)       │ │
│ ├───────────────────────┤ │
│ │                       │ │
│ │          ID           │ │
│ │        (bytes)        │ │
│ │                       │ │
│ ├───────────────────────┤ │
│ │   Number of Fields    │ │
│ │       (uvarint)       │ │
│ ├───────────────────────┤ │
│ │                       │ │
│ │        Field 1        │ │
│ │                       │ │
│ ├───────────────────────┤ │
│ │                       │ │
│ │          ...          │ │
│ │                       │ │
│ ├───────────────────────┤ │
│ │                       │ │
│ │        Field n        │ │
│ │                       │ │
│ └───────────────────────┘ │
└───────────────────────────┘

Field

Each field is composed of a name and a value. The name and value are a sequence of valid UTF-8 bytes and they are stored by encoding the length of the name (value), in bytes, as a variable-sized unsigned integer and then encoding the actual bytes which comprise the name (value). The name is encoded first and the value second.

┌───────────────────────────┐
│ ┌───────────────────────┐ │
│ │  Length of Field Name │ │
│ │       (uvarint)       │ │
│ ├───────────────────────┤ │
│ │                       │ │
│ │      Field Name       │ │
│ │        (bytes)        │ │
│ │                       │ │
│ ├───────────────────────┤ │
│ │ Length of Field Value │ │
│ │       (uvarint)       │ │
│ ├───────────────────────┤ │
│ │                       │ │
│ │      Field Value      │ │
│ │        (bytes)        │ │
│ │                       │ │
│ └───────────────────────┘ │
└───────────────────────────┘

Index File

The index file contains, for each postings ID in the segment, the offset of the corresponding document in the data file. The base postings ID is stored at the start of the file as a little-endian uint64. Following it are the actual offsets.

┌───────────────────────────┐
│            Base           │
│          (uint64)         │
├───────────────────────────┤
│                           │
│                           │
│          Offsets          │
│                           │
│                           │
└───────────────────────────┘

Offsets

The offsets are stored serially starting from the offset for the base postings ID. Each offset is a little-endian uint64. Since each offset is of a fixed-size we can access the offset for a given postings ID by calculating its index relative to the start of the offsets. An offset equal to the maximum value for a uint64 indicates that there is no corresponding document for a given postings ID.

┌───────────────────────────┐
│ ┌───────────────────────┐ │
│ │       Offset 1        │ │
│ │       (uint64)        │ │
│ ├───────────────────────┤ │
│ │          ...          │ │
│ ├───────────────────────┤ │
│ │       Offset n        │ │
│ │       (uint64)        │ │
│ └───────────────────────┘ │
└───────────────────────────┘

Documentation ¶

Index ¶

func MetadataFromDocument(document doc.Document, reader *EncodedDocumentReader) (doc.Metadata, error)
func ReadEncodedDocumentID(encoded doc.Encoded) ([]byte, error)
func ReadIDFromDocument(document doc.Document) ([]byte, error)
type DataReader
- func NewDataReader(data []byte) *DataReader
- func (r *DataReader) Read(offset uint64) (doc.Metadata, error)
type DataWriter
- func NewDataWriter(w io.Writer) *DataWriter
- func (w *DataWriter) Reset(wr io.Writer)
- func (w *DataWriter) Write(d doc.Metadata) (int, error)
type EncodedDataReader
- func NewEncodedDataReader(data []byte) *EncodedDataReader
- func (e *EncodedDataReader) Read(offset uint64) (doc.Encoded, error)
type EncodedDocumentReader
- func NewEncodedDocumentReader() *EncodedDocumentReader
- func (r *EncodedDocumentReader) Read(encoded doc.Encoded) (doc.Metadata, error)
type IndexReader
- func NewIndexReader(data []byte) (*IndexReader, error)
- func (r *IndexReader) Base() postings.ID
- func (r *IndexReader) Len() int
- func (r *IndexReader) Read(id postings.ID) (uint64, error)
type IndexWriter
- func NewIndexWriter(w io.Writer) *IndexWriter
- func (w *IndexWriter) Reset(wr io.Writer)
- func (w *IndexWriter) Write(id postings.ID, offset uint64) error
type Reader
type SliceReader
- func NewSliceReader(docs []doc.Metadata) *SliceReader
- func (r *SliceReader) Iter() index.IDDocIterator
- func (r *SliceReader) Len() int
- func (r *SliceReader) Metadata(id postings.ID) (doc.Metadata, error)
- func (r *SliceReader) Read(id postings.ID) (doc.Metadata, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func MetadataFromDocument ¶ added in v1.0.1

func MetadataFromDocument(document doc.Document, reader *EncodedDocumentReader) (doc.Metadata, error)

MetadataFromDocument retrieves a doc.Metadata from a doc.Document.

func ReadEncodedDocumentID ¶ added in v1.0.1

func ReadEncodedDocumentID(encoded doc.Encoded) ([]byte, error)

ReadEncodedDocumentID reads the document ID from the encoded document metadata.

func ReadIDFromDocument ¶ added in v1.0.1

func ReadIDFromDocument(document doc.Document) ([]byte, error)

ReadIDFromDocument reads the document ID from the document.

Types ¶

type DataReader ¶

type DataReader struct {
	// contains filtered or unexported fields
}

DataReader is a reader for the data file for documents.

func NewDataReader ¶

func NewDataReader(data []byte) *DataReader

NewDataReader returns a new DataReader.

func (*DataReader) Read ¶

func (r *DataReader) Read(offset uint64) (doc.Metadata, error)

type DataWriter ¶

type DataWriter struct {
	// contains filtered or unexported fields
}

DataWriter writes the data file for documents.

func NewDataWriter ¶

func NewDataWriter(w io.Writer) *DataWriter

NewDataWriter returns a new DataWriter.

func (*DataWriter) Reset ¶

func (w *DataWriter) Reset(wr io.Writer)

Reset resets the DataWriter.

func (*DataWriter) Write ¶

func (w *DataWriter) Write(d doc.Metadata) (int, error)

type EncodedDataReader ¶ added in v1.0.1

type EncodedDataReader struct {
	// contains filtered or unexported fields
}

EncodedDataReader is a reader for the data file for encoded document metadata.

func NewEncodedDataReader ¶ added in v1.0.1

func NewEncodedDataReader(data []byte) *EncodedDataReader

NewEncodedDataReader returns a new EncodedDataReader.

func (*EncodedDataReader) Read ¶ added in v1.0.1

func (e *EncodedDataReader) Read(offset uint64) (doc.Encoded, error)

Read reads a doc.Encoded from a data stream starting at the specified offset.

type EncodedDocumentReader ¶ added in v1.0.1

type EncodedDocumentReader struct {
	// contains filtered or unexported fields
}

EncodedDocumentReader is a reader for reading documents from encoded metadata.

func NewEncodedDocumentReader ¶ added in v1.0.1

func NewEncodedDocumentReader() *EncodedDocumentReader

NewEncodedDocumentReader returns a new EncodedDocumentReader.

func (*EncodedDocumentReader) Read ¶ added in v1.0.1

func (r *EncodedDocumentReader) Read(encoded doc.Encoded) (doc.Metadata, error)

Read reads a doc.Metadata from a doc.Encoded. Returned doc.Metadata should be processed before calling Read again as the underlying array pointed to by the Fields slice will be updated. This approach avoids allocating a new slice with a new backing array for every document processed, unlike (*DataReader).Read

type IndexReader ¶

type IndexReader struct {
	// contains filtered or unexported fields
}

IndexReader is a reader for the index file for documents.

func NewIndexReader ¶

func NewIndexReader(data []byte) (*IndexReader, error)

NewIndexReader returns a new IndexReader.

func (*IndexReader) Base ¶

func (r *IndexReader) Base() postings.ID

Base returns the base postings ID.

func (*IndexReader) Len ¶

func (r *IndexReader) Len() int

Len returns the number of postings IDs.

func (*IndexReader) Read ¶

func (r *IndexReader) Read(id postings.ID) (uint64, error)

type IndexWriter ¶

type IndexWriter struct {
	// contains filtered or unexported fields
}

IndexWriter is a writer for the index file for documents.

func NewIndexWriter ¶

func NewIndexWriter(w io.Writer) *IndexWriter

NewIndexWriter returns a new IndexWriter.

func (*IndexWriter) Reset ¶

func (w *IndexWriter) Reset(wr io.Writer)

Reset resets the IndexWriter.

func (*IndexWriter) Write ¶

func (w *IndexWriter) Write(id postings.ID, offset uint64) error

Write writes the offset for an id. IDs must be written in increasing order but can be non-contiguous.

type Reader ¶ added in v0.15.14

type Reader interface {
	// Len is the number of documents contained by the reader.
	Len() int
	// Read reads a document with the given postings ID.
	Read(id postings.ID) (doc.Metadata, error)
	// Iter returns a document iterator.
	Iter() index.IDDocIterator
}

Reader is a document reader from an encoded source.

type SliceReader ¶ added in v0.5.0

type SliceReader struct {
	// contains filtered or unexported fields
}

SliceReader is a docs slice reader for use with documents stored in memory.

func NewSliceReader ¶ added in v0.5.0

func NewSliceReader(docs []doc.Metadata) *SliceReader

NewSliceReader returns a new docs slice reader.

func (*SliceReader) Iter ¶ added in v0.15.14

func (r *SliceReader) Iter() index.IDDocIterator

Iter returns a docs iterator.

func (*SliceReader) Len ¶ added in v0.5.0

func (r *SliceReader) Len() int

Len returns the number of documents in the slice reader.

func (*SliceReader) Metadata ¶ added in v1.0.1

func (r *SliceReader) Metadata(id postings.ID) (doc.Metadata, error)

Metadata implements MetadataRetriever and reads the document with postings ID.

func (*SliceReader) Read ¶ added in v0.5.0

func (r *SliceReader) Read(id postings.ID) (doc.Metadata, error)

Read returns a document from the docs slice reader.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL