sheetfile

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 16, 2021 License: MIT Imports: 10 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CreateCellTableIfNotExists

func CreateCellTableIfNotExists(db *gorm.DB, sheetName string) error

CreateCellTableIfNotExists Query the sqlite_master table to check whether Cell table for SheetName has existed or not. If not, create such a table with create_tmpl. Creating a table in transactions is not allowed in sqlite, so this function should not be called in db.Transaction.

@para

db: a gorm connection, should not be a transaction.
SheetName: filename of Cell belongs to.

@return

error from execution of queries. If this function is called in a transaction, a 'database is locked'
will be returned.

func CreateSheetFile

func CreateSheetFile(db *gorm.DB, alloc *datanode_alloc.DataNodeAllocator, filename string) (*SheetFile, *Cell, *Chunk, error)

CreateSheetFile Create a SheetFile, corresponding sqlite table to store Cells of the SheetFile, MetaCell and chunk used to store it. Theoretically, it's not required to flush the metadata of a new file to disk immediately. However, we make use of sqlite as some kind of alternative to general BTree. So we create the table here as a workaround.

@para

db: a gorm connection. It should not be a transaction.(See CreateCellTableIfNotExists)
filename: filename of new SheetFile

@return

*SheetFile: pointer of new SheetFile if success, or nil.
error:
	some error happens while creating cell table.
	*errors.NoDataNodeError: This function must allocate a Chunk for MetaCell, if there
	are no DataNodes for storing this cell, returns NoDateNodeError.

func GetCellID

func GetCellID(row uint32, col uint32) int64

GetCellID Compute CellID by row and column number. It is almost impossible for a sheet to scale up to 4294967295x4294967295, so it's enough to use an uint32 to represent row and column. Due to this, CellID is formed by put row number in higher 32bits of an uint64, and put column number in lower 32bits.

@return

uint64 CellID of Cell located at (row, col)

func GetCellTableName

func GetCellTableName(sheetName string) string

GetCellTableName Same as Cell.TableName, for creation of Cell.

@return

sqlite table name to store Cell of SheetName

Types

type Cell

type Cell struct {
	gorm.Model
	// CellID is used to accelerate looking up cell by row and column number
	// CellID is composed of row and column number, which makes CellID a standalone sqlite index,
	// rather than maintaining a joined index on (row,col)
	// uint64 is not supported in sqlite, use int64 as a workaround
	CellID  int64 `gorm:"index"`
	Offset  uint64
	Size    uint64
	ChunkID uint64

	SheetName string `gorm:"-"`
}

Cell Represent a cell of a sheet. Every Cell is stored in a Chunk, starting at a fixed offset(see SheetFile), and every Chunk contains multiple Cells. The number of cells stored in a Chunk is specified by config.MaxCellsPerChunk. Because every Chunk is composed of slots for config.MaxCellsPerChunk, the Size of a Cell is set to config.BytesPerChunk / config.MaxCellsPerChunk, except for the special MetaCell, where the metadata of a SheetFile are stored(see SheetFile too).

Cell plays as an index from row, column number to concrete Chunk which actually stores data, providing applications an interface to manipulate cell directly, instead of computing offset of some cell manually. This index is critical to API of our filesystem, and must be persistent. Cell is also a gorm model. All Cell of a SheetFile are stored in a sqlite table, named as 'cells_{filename}'. Cell belongs to different SheetFile will be stored in different tables, we implement this by executing templated SQL. See also create_tmpl below.

func GetSheetCellsAll

func GetSheetCellsAll(db *gorm.DB, sheetName string) []*Cell

GetSheetCellsAll Load all Cell of a SheetFile from sqlite database. This method should only be used to load checkpoints in sqlite. After loading from sqlite, subsequent mutations should be conducted in memory, and rely on journaling to tolerate failure, until checkpointing next time.

@return

[]*Cell: All Cell stored in table corresponding to SheetName

func NewCell

func NewCell(cellID int64, offset uint64, size uint64, chunkID uint64, sheetName string) *Cell

func (*Cell) IsMeta

func (c *Cell) IsMeta() bool

IsMeta Returns true if c is the MetaCell. (See SheetFile)

func (*Cell) Persistent

func (c *Cell) Persistent(tx *gorm.DB)

Persistent Flush Cell data in memory into sqlite. This method should be used only for checkpointing, and is supposed to be called in a transaction for atomicity.

func (*Cell) Snapshot

func (c *Cell) Snapshot() *Cell

Snapshot Returns a *Cell points to the copy of c. See SheetFile for the necessity of Snapshot.

@return

*Cell points to the copy of c.

func (*Cell) TableName

func (c *Cell) TableName() string

TableName Returns the table name which contains cells of some SheetFile.

type Chunk

type Chunk struct {
	model.Model
	DataNode string
	Version  uint64
	Cells    []*Cell
}

Chunk Represent a fixed-size block of data stored on some DataNode. The size of a Chunk is given by config.BytesPerChunk.

A Version is maintained by MasterNode and DataNode separately. Latest Version of a Chunk is stored in MasterNode, and the actual Version is placed on DataNode. Version is necessary for serializing write operations to a Chunk. When a client issues a write operation, MasterNode will increase the Version by 1 and return it to client. Client must send both data to write and the Version to DataNode which actually stores the Chunk. This operation success iff version in request is equal to Version in DataNode plus 1, by which we achieve serialization of write operations. Version can also be utilized to select correct replication of a Chunk when quorums were introduced.

As to other metadata datastructures, Chunk should be maintained in memory, with the aid of journaling to tolerate fault, and flushed to sqlite during checkpointing only.

func (*Chunk) Persistent

func (c *Chunk) Persistent(tx *gorm.DB)

Persistent Flush Chunk data in memory into sqlite. But Chunk.Cells is not taken into consideration because dynamic table names are applied. They should be persisted manually. This method should be used only for checkpointing, and is supposed to be called in a transaction for atomicity.

func (*Chunk) Snapshot

func (c *Chunk) Snapshot() *Chunk

Snapshot Returns a *Chunk points to the copy of c. See SheetFile for the necessity of Snapshot.

@return

*Chunk points to the copy of c.

type SheetFile

type SheetFile struct {

	// All Chunks containing all Cells of the SheetFile.
	// Maps ChunkID to *Chunk.
	Chunks map[uint64]*Chunk
	// All Cells in the sheet.
	// Maps CellID to *Cell.
	Cells map[int64]*Cell
	// Keeps track of latest Chunk whose remaining space is capable of storing a new Cell.
	LastAvailableChunk *Chunk
	// contains filtered or unexported fields
}

SheetFile Represents a file containing a sheet. Every SheetFile is made of lots of Cell. Almost every Cell has config.MaxBytesPerCell bytes of storage to store its data, and there is a special one called MetaCell, whose size will be config.BytesPerChunk. Applications should consider to make use of MetaCell to store data related to whole sheet. MetaCell can be accessed by (config.SheetMetaCellRow, config.SheetMetaCellCol).

Composed of Cells, SheetFile provides a row/col-oriented API to applications. Cells is the index of such an API. So Cells must be persistent, so as Chunks storing those Cells. And as a logical collection of Cells and Chunks, SheetFile should be used as a helper to persist Cells and Chunks. However, SheetFile itself has not to be kept permanently. All of its data can be resumed by scanning Cells loaded from database.

Chunks and Cells are maintained in pointers. Sometimes those data will be returned to the outside world, which implies we can't rely SheetFile.mu on accessing those pointers goroutine-safely. So all pointers returned will point to a copy, or snapshot of a Chunk or Cell. In other words, they are a 'goroutine-safe view' of Chunks/Cells at some point.

func LoadSheetFile

func LoadSheetFile(db *gorm.DB, alloc *datanode_alloc.DataNodeAllocator, filename string) *SheetFile

LoadSheetFile Load a SheetFile from database. As mentioned above, SheetFile has not to be persisted. In fact, this function loads all Cells of given filename from database. Afterwards, this function scans over those cells, adding them to SheetFile.Cells, and their Chunk to SheetFile.Chunks. Besides, this function also set SheetFile.LastAvailableChunk to the first Chunk whose isAvailable() is true.

This method should only be used to load checkpoints in sqlite. (See GetSheetCellsAll)

@para

db: a gorm connection. It can be a transaction.
filename: The validity of filename won't be checked. Caller should guarantee that
a valid filename is passed in.

@return

*SheetFile: pointer of loaded SheetFile.

func (*SheetFile) GetAllChunks

func (s *SheetFile) GetAllChunks() []*Chunk

GetAllChunks Returns the Snapshot of all Chunks.

@return

[]*Chunk: slice of pointers pointing to snapshots of s.Chunks.

func (*SheetFile) GetCellChunk

func (s *SheetFile) GetCellChunk(row, col uint32) (*Cell, *Chunk, error)

GetCellChunk Lookup Cell located at (row, col) and its Chunk.

@para

row: row number of Cell
col: column number of Cell

@return

*Cell, *Chunk: corresponding snapshots if no error
error:
	*errors.CellNotFoundError if row, col passed in is invalid.

func (*SheetFile) Persistent

func (s *SheetFile) Persistent(tx *gorm.DB) error

Persistent Flush the Cell and Chunk data stored in a SheetFile to sqlite.

@para

db: a gorm connection. It's supposed to be a transaction.

@return

error: always nil currently. But potentially errors may be introduced
in the future.

func (*SheetFile) WriteCellChunk

func (s *SheetFile) WriteCellChunk(row, col uint32, tx *gorm.DB) (*Cell, *Chunk, error)

WriteCellChunk Performs necessary metadata mutations to handle an operation of writing data to a Cell.

@para

row, col: row number, column number of Cell to write
tx: a gorm connection, can be a transaction

@return

*Cell, *Chunk: snapshots of the Cell and its Chunk to be written.
error:
	*errors.NoDataNodeError if there is no DataNode registered.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL