tsd

package module
v0.0.0-...-379ee7c Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 25, 2021 License: MIT Imports: 4 Imported by: 0

README

Tiny Streaming Data (TSD) Format

TSD is a simple and extensible file/data format optimized for applications that involve binary data streaming where every byte matters. It is inspired by interchange formats such as RIFF and IFF.

TSD solves the problem of packing different types of binary data in a single file or stream without delimeters. This is commonly encountered in packing protobuf data.

Format Specification

In TSD, much like RIFF, binary data blobs are packed in 'chunks', each with an identifier and a length prefix. In TSD, headers and padding are removed and protobuf varints are used instead of fixed-length fourCC's for efficiency.

A TSD file is an indefinitely repeating structure:

[uvarint chunk ID][uvarint chunk length][bytes of binary data]...[ID][length][data]...

Readers of TSD files use the chunk ID and length to iterate data and handle it specially. Consider a file type used for streaming a user's social media activity. We can link different chunk IDs to different types of activity:

  1. Text Post (ASCII text)
  2. Like (ID of liked post)
  3. Photo (JPG data)

The file will look like:

[1][12]Hello world![3][42178]d^T<DF>=s<C5>DН<DD>E>;sԚ...[2][6]776541

Because we use varints, packing data like Hello world! (with a chunk ID of 1 and a length of 12) only takes two additional bytes for a header. Chunk IDs can be any 64-bit unsigned integer, but chunks that occur more frequently should have lower ID values to take up less space.

A client reading this data will be able to use a switch statement on the chunk ID (1, 2, or 3) and read back each data blob knowing what to do with it. If, in the future, we want to add stories to our social media platform, we can introduce chunkID #4, and clients that don't yet support stories will be able to skip over that chunk.

Chunk ID 0 is reserved as a continuation signal - this facilitates writing data without knowing its size ahead of time by being able to split it into pieces. TSD like [1][6]Hello [0][6]world! can simply be read together as Hello world! with ID 1.

What's in the Box

This repo contains a Go library to facilitate reading, writing, and inspecting Tiny Streaming Data. In the future I may add support for more languages, but it shouldn't be difficult to build your own reader/writer in any language that is also supported by protobuf.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ByteReaderReader

type ByteReaderReader interface {
	io.ByteReader
	io.Reader
}

type ChunkEncoder

type ChunkEncoder interface {
	Encode() []byte
	ChunkID() ChunkID
}

ChunkEncoder interface makes it easy to extend TSD to write fun things in the file

type ChunkID

type ChunkID = uint64

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

func NewReader

func NewReader(r ByteReaderReader) *Reader

func (*Reader) Next

func (t *Reader) Next() (ChunkID, io.Reader, error)

Next gets the next chunk in the TSD stream. Client can get the

type Writer

type Writer struct {
	// contains filtered or unexported fields
}

TSDWriter wraps an io.Writer and facilitates writing TSD Chunks

func NewWriter

func NewWriter(w io.Writer) *Writer

Create a new TSD writer to start writing in TSD format

func (*Writer) Write

func (t *Writer) Write(id ChunkID, data []byte) error

Write a small chunk with

func (*Writer) WriteFrom

func (t *Writer) WriteFrom(c ChunkEncoder) error

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL