SIR
SIR is a streamable, record-oriented binary file format with a sparse index.
It is designed to efficiently locate blocks containing specific records using a block-based storage structure and an index table.
Motivation
We needed a way to stream simple log data (timestamp + message) generated by short-lived tasks to users in real time, and to efficiently retrieve specific log ranges after the task ends.
The first target platform was the Web, so a lightweight implementation was preferred.
Features
- Indexed: Uses a monotonic unsigned 64-bit index for records.
- Write Streamable: Append-only structure.
- Read Streamable: Efficiently locates blocks containing a specific index.
Layout
There are four sections: Header, Blocks, Index Table, and Footer.
0 1 2 3 4 5 6 7 8
. . . . . . . . .
00 | Magic | VER | COMP | RSV |
08 | Index Table Offset |
10 | First Block Offset |
18 | Metadata |
- Magic: A fixed constant to identify the file format. The first 4 bytes must be
0x53 0x49 0x52 0x00
(SIR\0
).
- VER: SIR format version. Currently, only
0x01
is supported.
- COMP: Compression algorithm used for the payload. See Compression Algorithms.
- Index Table Offset: Start position of the Index Table in the file. If 0, refer to the Footer section to find the Index Table offset.
- First Block Offset: Start position of the first Block in the file. If 0, refer to the Footer section.
- Metadata: Can be used as needed.
Blocks
0 1 2 3 4 5 6 7 8
. . . . . . . . .
00 | Uncompressed Size | Payload (variable) | # Block 1
08 | CRC32 Checksum | Sync Marker |
10 | Uncompressed Size | Payload (variable) | # Block 2
18 | CRC32 Checksum | Sync Marker |
20 | ... |
- First Index: The index of the first record in the block.
- Uncompressed Size: Original size of the payload if compressed.
- CRC32 Checksum: CRC32 value of the payload for integrity verification.
- Sync Marker: Fixed constant to mark block boundaries,
0xDE 0xCA 0xFE 0x42
.
Index Table
0 1 2 3 4 5 6 7 8
. . . . . . . . .
00 | First Index | # Group 1
08 | Offset |
10 | Index Delta | Offset Delta | # Delta 1
18 | Index Delta | Offset Delta | # Delta 2
20 | ... |
20 00 | Index Delta | Offset Delta | # Delta 63
20 08 | First Index | # Group 2
20 10 | Offset |
20 18 | Index Delta | Offset Delta | # Delta 1
20 20 | ... |
The Index Table records the location of each block in the file.
It is divided into groups, each with one absolute position and 63 delta positions.
The absolute position indicates the first index value of the block and its file offset; deltas are used to incrementally calculate the positions of subsequent blocks.
0 1 2 3 4 5 6 7 8
. . . . . . . . .
00 | Index Table Offset |
08 | Magic |
- Index Table Offset: Start position of the Index Table in the file.
- Magic: A fixed constant to identify the end of file. The last 4 bytes must be
0x53 0x49 0x52 0x00
(SIR\0
).
Compression Algorithms
Value |
Algorithm |
0x00 |
None |
0x01 |
Deflate |
0x02 |
Brotli |
0x03 |
LZ4 |
0x04 |
Snappy |
0x05 |
Zstandard |