Documentation
¶
Overview ¶
Package snapshot implements the snapshot protobuf serialisation code.
It implements custom serialisation and deserialisation code for performance.
## Background
Originally, this used gogo-protobuf to generate the serialisation code, but that ended up allocating a []KV slice for all DBI entries, with inside the KV a []byte for both key and value. Since one slices takes up 24 byte just for its header, that is 48 bytes of overhead per entry.
For 6 million entries, that is theoretically 300 MB of overhead.
What we actually saw were allocations of over 2.7 GB in just the DBI code under the following circumstances:
- main was 26 MB compressed, 175 MB uncompressed. - shard was 34 GB compressed, 265 MB uncompressed.
In total that is 440 MB data uncompressed. It turns out that half of the allocation was used by the code copying all key and value bytes.
flat flat% sum% cum cum% 2047.55MB 56.26% 56.26% 2722.09MB 74.80% github.com/PowerDNS/lightningstream/snapshot.(*DBI).Unmarshal 674.54MB 18.54% 98.45% 674.54MB 18.54% github.com/PowerDNS/lightningstream/snapshot.(*KV).Unmarshal
Patching the generated code to not copy the data reduced the total memory use to 1.4 GB:
1378.77MB 52.20% 52.20% 1378.77MB 52.20% github.com/PowerDNS/lightningstream/snapshot.(*DBI).Unmarshal
That is still significantly more than the 440 MB we would expect. Part of it is likely because the allocated slices are up to 2x larger than needed with the append() growth algorithm, the rest is probably memory allocator overhead.
The solution to this was to stream the data instead of loading the full protobuf into slices.
## Other protobuf options
At the moment of writing the status of Go protobuf libraries was as follows:
- GoGo protobuf was no longer maintained and deprecated - Google protobuf insisted on deserializing into []*KV - VTProtobuf used Google protobuf to generate the struct, thus ending up with []*KV - CSProto mostly wrapped the above for easy use of mixed code in gRPC
The only library that supported the use of a custom Go type to handle DBIs lazily was GoGo, but it was deprecated, and support for this did appear to have many caveats.
A potential workaround was to use a standard lib for the outer protobuf, but then define `repeated bytes databases = 3` instead of `repeated DBI databases = 3`, and deserialize the entries on demand, since bytes and sub-messages have the same wire format.
In the end we decided to just use custom code for all.
Index ¶
- Constants
- func Name(syncerName, instanceID, generationID string, ts time.Time) string
- func NameTimestamp(ts time.Time) string
- func NameTimestampFromNano(tsNano header.Timestamp) string
- func RegisterExtension(extension, kind string)
- func ShortHash(instance, timestamp string) string
- func TransformSupported(transform string) bool
- type DBI
- func (d *DBI) Append(kv KV)
- func (d *DBI) AsInefficientKVList() ([]KV, error)
- func (d *DBI) Flags() uint64
- func (d *DBI) Map(transform string, f KVMapFunc) (*DBI, error)
- func (d *DBI) Marshal() []byte
- func (d *DBI) Name() string
- func (d *DBI) Next() (kv KV, err error)
- func (d *DBI) ResetCursor()
- func (d *DBI) SetFlags(v uint64)
- func (d *DBI) SetName(s string)
- func (d *DBI) SetTransform(s string)
- func (d *DBI) Size() (n int)
- func (d *DBI) Transform() string
- func (d *DBI) ValidateTransform(formatVersion uint32, nativeSchema bool) error
- type DumpDataStats
- type ErrUnexpectedWireType
- type KV
- type KVMapFunc
- type Meta
- type NameExtra
- type NameExtraItem
- type NameInfo
- type Snapshot
- type Update
Constants ¶
const ( FieldDBIName = 1 FieldDBIEntries = 2 FieldDBIFlags = 3 FieldDBITransform = 4 // TagSize0To15 is the number of bytes taken by a key with tag 1-15 TagSize0To15 = 1 )
Protobuf field numbers
const ( // CurrentFormatVersion is the current snapshot format we write // Version 2 added the flags fields and Deleted flags, before this version // empty values indicated deleted entries. // Version 3 fixed the DBI flags to always represent the original DBI // instead of the shadow DBI, added the compatVersion field, and // added the per-database 'transform' field. CurrentFormatVersion uint32 = 3 // CompatFormatVersion is the oldest snapshot version we can read. // v1 is the first version of our snapshots. // We will try to always support any old version, unless there is very // strong reason not to. CompatFormatVersion uint32 = 1 // WriteCompatFormatVersion is the oldest snapshot version that snapshots // which were made with this program version are compatible with. // v1 is the first version of our snapshots, which indicates that any // old client is able to read these snapshots. // Note that there are limitations regarding support for newer features. // For example, during v1 an empty entry indicated deletion, while v2 // introduced a flag for this, so an old v1 client cannot support // non-deleted empty entries. WriteCompatFormatVersion uint32 = 1 )
const ( FieldKVKey = 1 FieldKVValue = 2 FieldKVTimestampNano = 3 FieldKVFlags = 4 )
Protobuf field numbers
const ( FieldMetaGenerationID = 1 FieldMetaInstanceID = 2 FieldMetaHostname = 3 FieldMetaLMDBTxnID = 4 FieldMetaTimestampNano = 5 FieldMetaDatabaseName = 7 FieldMetaFromLMDBTxnID = 8 )
Protobuf field numbers
const ( FieldSnapshotFormatVersion = 1 FieldSnapshotMeta = 2 FieldSnapshotDBI = 3 FieldSnapshotCompatVersion = 4 )
Protobuf field numbers
const ( // TransformDupSortHackV1 is the 'transform' field for the current // dupsort_hack key-value transformation. TransformDupSortHackV1 = "dupsort_hack_v1" // TransformNone indicates no transformation TransformNone = "" )
const DefaultExtension = "pb.gz"
const KindSnapshot = "snapshot"
const MB = 1024 * 1024
Variables ¶
This section is empty.
Functions ¶
func NameTimestamp ¶
NameTimestamp convert a time.Time to a string for embedding in a filename
func NameTimestampFromNano ¶
NameTimestampFromNano is NameTimestamp for LS header timestamps
func RegisterExtension ¶
func RegisterExtension(extension, kind string)
RegisterExtension registers a valid snapshot file extension with a kind name
func ShortHash ¶
ShortHash returns a short hash of name info to visually distinguish snapshots in logs
func TransformSupported ¶
TransformSupported checks if a given transform is supported.
Types ¶
type DBI ¶
type DBI struct { // Some statistics for logging (not persisted) NumWrittenEntries int64 // Only incremented when writing // contains filtered or unexported fields }
DBI describes the contents of a single DBI. The top-level fields (name, flags and transform) can only be set before any KV data is written with Append(KV). If loaded from an existing protobuf, the top-level fields are read-only.
func NewDBIFromData ¶
NewDBIFromData creates a new DBI from protobuf data
func NewDBISize ¶
NewDBISize creates a new empty DBI and pre-allocates memory for the protobuf data to avoid future reallocs. The size is given in bytes.
func (*DBI) Append ¶
Append appends a new KV to the DBI protobuf. The data that KV.Key and KV.Value refer to is copied in the process, so it is also safe when they point directly into LMDB pages.
func (*DBI) AsInefficientKVList ¶
AsInefficientKVList returns all KV entries as an inefficient []KV. Only use this for tests.
func (*DBI) Marshal ¶
Marshal returns the currently written protobuf data. This implicitly calls flushFields, which will prevent further changes to the top-level fields. Careful, this does not make a copy.
func (*DBI) Next ¶
Next decodes the next KV from the data. BenchmarkDBI_Next benchmarks this, locally it takes about 40ns per entry (or 40ms for 1 million entries), which makes it a fine replacement for looping over a slice, given that we avoid the allocations.
func (*DBI) ResetCursor ¶
func (d *DBI) ResetCursor()
ResetCursor resets the read cursor to the beginning of the buffer
func (*DBI) SetTransform ¶
type DumpDataStats ¶
type ErrUnexpectedWireType ¶
type ErrUnexpectedWireType struct { Tag int WireType csproto.WireType ExpWireType csproto.WireType }
func (ErrUnexpectedWireType) Error ¶
func (e ErrUnexpectedWireType) Error() string
type KV ¶
func (*KV) MaskedFlags ¶
type Meta ¶
type NameExtra ¶
type NameExtra []NameExtraItem
NameExtra are extra values added to the filename after the GenerationID field.
type NameExtraItem ¶
type NameExtraItem string
NameExtraItem represents one NameExtra value, e.g. "X1234".
Requirements for these values:
- Start with a unique capital ascii letter [A-Z] indicating the type
- 'G' is reserved to prevent confusion with the GenerationID.
- Followed by zero or more string characters for the value.
- These items are separated by "__" in the filename, so they cannot contain this substring.
- A type cannot appear more than once.
- The items SHOULD appear sorted alphabetically.
func (NameExtraItem) String ¶
func (nei NameExtraItem) String() string
String returns the whole value as is.
func (NameExtraItem) Type ¶
func (nei NameExtraItem) Type() byte
Type returns the type byte (first letter)
func (NameExtraItem) Value ¶
func (nei NameExtraItem) Value() string
Value returns the value part (after the first letter)
type NameInfo ¶
type NameInfo struct { FullName string // Full filename BaseName string // Part before the Extension Extension string // File extension Kind string // Kind of file based on extension SyncerName string // Corresponds to the database being synced InstanceID string // ID of the LS instance that generated it GenerationID string // Currently unused, for old idea that was abandoned TimestampString string // Timestamp string in filename Timestamp time.Time // Nanosecond precision snapshot timestamp Extra NameExtra // LSE: extra values after the GenerationID }
NameInfo breaks out all information encoded in a snapshot filename
type Snapshot ¶
type Snapshot struct { FormatVersion uint32 // version of this snapshot format CompatVersion uint32 // compatible with clients that support at least this version Meta Meta Databases []*DBI `json:",omitempty"` }
Snapshot is the root object in a snapshot protobuf
Source Files
¶
Directories
¶
Path | Synopsis |
---|---|
Package storage keeps a global reference to the active simpleblob storage.
|
Package storage keeps a global reference to the active simpleblob storage. |