rawlite

package module
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 1, 2025 License: Unlicense Imports: 7 Imported by: 0

README

rawlite: mapreduce into SQLite databases

rawlite exposes a multithreaded API to create SQLite databases. See the godoc for documentation.

Currently the library is alpha quality; I believe it works but it needs better documentation and tests.

Documentation

Overview

Package rawlite implements a library to write SQLite databases with a streaming API for quickly ingesting large amounts of data into SQLite.

First, skim the description of SQLite's architecture at https://sqlite.org/arch.html.

In terms of SQLite's architecture, this library abstracts the OS interface, pager, and part of the B-tree layer. Its performance comes from using multiple threads to generate leaf nodes, only requiring a single-threaded fixup phase at the end to generate interior nodes and the SQLite file header. The tradeoff is that the user must cooperate with the library to generate those leaf nodes in such a way that it can assemble them into a valid B-tree.

Now, read the description of the SQLite file format at https://sqlite.org/fileformat2.html. This library abstracts some details of the file format while deliberately exposing others, which is the key to its performance. An understanding of the file format is essential to use the library correctly.

B-trees are abstracted into a stream of cells. A TableStream generates a stream of leaf nodes for the cells written into it. The performance comes from using many of them in parallel, each operating independently, arranging the data in such a way that these independent streams may be merged at the end. Closing a Table creates the interior nodes of the B-tree. Closing a Database creates the `sqlite_schema` table pointing to the root nodes of each table.

Page allocation, B-tree interior nodes, and overflow pages for large cells are abstracted, but it is still required to format cells correctly and to ensure that it is possible to combine the independently-generated streams of leaf nodes into a valid B-tree. The library guarantees this for TableStream by internally generating rowids for each cell in such a way that guarantees a valid B-tree can be formed.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Database

type Database struct {
	// contains filtered or unexported fields
}

Database represents a database file being created.

func OpenDatabase

func OpenDatabase(file io.WriterAt) *Database

OpenDatabase prepares to write a SQLite database to file.

func (*Database) Close

func (db *Database) Close() error

Close writes the SQLite file header and the sqlite_schema table pointing to the root nodes of each Table and Index. It does not close the file the database was opened on.

func (*Database) OpenTable

func (db *Database) OpenTable() *Table

OpenTable records schema information for an index and prepares the database for TableStreams to begin work.

type Table

type Table struct {
	// contains filtered or unexported fields
}

Table represents a table being created.

func (*Table) Close

func (tbl *Table) Close(name, sql string) error

Close closes the B-tree and informs the Database of the root page number.

func (*Table) OpenStream

func (tbl *Table) OpenStream() *TableStream

OpenStream opens a TableStream for writing to this table.

type TableStream

type TableStream struct {
	// contains filtered or unexported fields
}

TableStream represents one stream of data being written to a Table. TableStreams are not thread-safe; open one TableStream per worker goroutine.

func (*TableStream) Close

func (s *TableStream) Close() error

Close informs the parent Table that this TableStream is finished writing, passing it any bookkeeping information required to construct the B-tree.

func (*TableStream) Flush

func (s *TableStream) Flush() error

Flush flushes any buffered pages. WriteRow will begin with a new page if called after Flush.

func (*TableStream) WriteRow

func (s *TableStream) WriteRow(row []byte) (rowid int64, err error)

WriteRow writes one row to the table whose contents are row, returning the rowid assigned to the row and any error resulting from writing pages to the database.

WriteRow does not retain row.

Directories

Path Synopsis
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL