tdb

package module
v0.9.10 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 21, 2023 License: Apache-2.0 Imports: 10 Imported by: 0

README

Tdb Overview

Tdb “Text DataBase” format is a plain text human readable typed database storage format.

Tdb is an ideal alternative to CSV. A Tdb file can store any number of tables. Every table is named, and every field has a name and a type. Types are not-null by default, but can be nullable if required. The seven supported types include strings which respect all whitespace (including newlines), and which may contain any UTF-8 characters (using XML-escaping conventions), binary (e.g., for images), Booleans, numbers (integer and real), and dates and datetimes.

Tdb libraries are available in Go and Python with a Rust library in development. The Tdb format is designed to be very easy to parse, so creating a Tdb library in virtually any language should be straightforward.

Datatypes

Tdb supports the following seven built-in datatypes.

Type Example(s) Notes
bool F A Tdb reader should also accept 'f', 'N', 'n', 't', 'Y', 'y', '0', '1'
bytes (20AC 65 66 48) There must be an even number of case-insensitive hex digits; whitespace (spaces, newlines, etc.) optional.
date 2022-04-01 Basic ISO8601 YYYY-MM-DD format.
datetime 2022-04-01T16:11:51 ISO8601 YYYY-MM-DDTHH[:MM[:SS]] format; 1-sec resolution no timezone support.
int -192 234 7891409 Standard integers.
real 0.15 0.7e-9 2245.389 Standard and scientific notation.
str <Some text which may include newlines> For &, <, >, use &amp;, &lt;, &gt; respectively.

All fields are not null by default and must contain a valid value of the field's type. To make a field nullable, append ? to its typename, e.g., int?.

Strings may not include &, < or >, so if they are needed, they must be replaced by the XML/HTML escapes &amp;, &lt;, and &gt; respectively. Strings respect any whitespace they contain, including newlines.

Each field value is separated from its neighbor by whitespace, and conventionally records are separated by newlines. However, in practice, since every field in every record must be present (even if only a null value or an empty bytes or string), records may be laid out however you like.

Where whitespace is allowed (or required) it may consist of one or more spaces, tabs, or newlines in any combination.

Examples

CSV

Although widely used, the CSV format is not standardized and has a number of problems. Tdb is a standardized alternative that can distinguish fieldnames from data records, can handle multiline text (including text with commas and quotes) without formality, and can store one—or more—tables in a single Tdb file.

Here's a simple CSV file:

Date,Price,Quantity,ID,Description
"2022-09-21",3.99,2,"CH1-A2","Chisels (pair), 1in & 1¼in"
"2022-10-02",4.49,1,"HV2-K9","Hammer, 2lb"
"2022-10-02",5.89,1,"SX4-D1","Eversure Sealant, 13-floz"

Here's a Tdb equivalent:

[PriceList Date date Price real Quantity int ID str Description str
%
2022-09-21 3.99 2 <CH1-A2> <Chisels (pair), 1in &amp; 1¼in> 
2022-10-02 4.49 1 <HV2-K9> <Hammer, 2lb> 
2022-10-02 5.89 1 <SX4-D1> <Eversure Sealant, 13-floz> 
]

Every table starts with a tablename followed by one or more fields. Each field consists of a fieldname and a type.

Superficially this may not seem much of an improvement on CSV (apart from Tbd's superior string handling and strong typing), but as the next example shows, a Tdb file can contain one or more tables, not just one like CSV.

Database

Database files aren't normally human readable and usually require specialized tools to read and modify their contents. Yet many databases are relatively small (both in size and number of tables), and would be more convenient to work with if human readable. For these, Tdb format provides a viable alternative. For example:

[Customers CID int Company str Address str? Contact str Email str
%
50 <Best People> <123 Somewhere> <John Doe> <j@doe.com> 
19 <Supersuppliers> ? <Jane Doe> <jane@super.com> 
]
[Invoices INUM int CID int Raised_Date date Due_Date date Paid bool Description str?
%
152 50 2022-01-17 2022-02-17 no <COD> 
153 19 2022-01-19 2022-02-19 yes ?
]
[Items IID int INUM int Delivery_Date date Unit_Price real Quantity int Description str
%
1839 152 2022-01-16 29.99 2 <Bales of hay> 
1840 152 2022-01-16 5.98 3 <Straps> 
1620 153 2022-01-19 11.5 1 <Washers (1-in)> 
]

In the Customers table the second customer's Address and in the Invoices table, the second invoice's Description both have nulls as their values. (No other fields may have nulls only these fields are nullable).

Config

Configuration files often consist of key–value pairs or grouped key–value pairs. For example, a .ini file like this:

symbols=latin
[Window]
x=32
y=28
[Colors]
foreground=lightyellow
background=#FFE7FF

could be represented by a .tdb like this:

[config_int key str value int
%
<x> 32
<y> 28
]
[config_str key str value str
%
<foreground> <lightyellow>
<background> <#FFE7FF>
<symbols> <latin>
]

And if grouping were required, like this:

[config_int group str? key str value int
%
<Window> <x> 32
<Window> <y> 28
]
[config_str group str? key str value str
%
<Colors> <foreground> <lightyellow>
<Colors> <background> <#FFE7FF>
? <symbols> <latin>
]

Here, we've allowed group to be null (equivalent to the .ini "General" group), but we could easily have made it not-null and required a group name for all groups.

Minimal Tdb Files
[T f int
%
]

This file has a single table called T which has a single field called f of type int, and no records.

[T f int
%
0
]

This is like the previous table but now with one record containing the value 0.

[T f int?
%
0
?
]

Again like the previous table, but now with two records, the first containing the value 0, and the second containing null which is permitted since the field's type is nullable.

Timezones and Metadata

Tdb does not have direct timezone support. There are three simple solutions for this.

If all the dates in the database are in the same timezone, then one approach is to store all the dates as UTC. Alternatively, add a tiny configuration table with the timezone data, for example:

[Config key str value str?
%
<timezone> <+02:30>
]

If, however, the dates being stored have varying timezones, then add another column specifically for the timezone. Something along these lines:

[Readings meter str reading real when date timezone str
%
<EX194B4> 1932.49 2024-11-17 <-03:00>
<V1938DX> 8492.1 2024-10-30 <+02:30>
]

If comments or metadata are required, simply create an additional table to store this data and add it to the Tdb. For example, use a Config table as shown above.

Libraries

Library Language Homepage
tdb-go Go https://pkg.go.dev/github.com/mark-summerfield/tdb-go
tdb-py Python https://pypi.org/project/tdb-py
tdb-rs Rust https://crates.io/crates/tdb-rs (in development)

We will happily add links to implementations in other languages.

BNF

Tdb files use the UTF-8 encoding. Tdb syntactical elements are all ASCII, so it is possible to read Tdb files as bytes (as the Go library does) or as Unicode characters (as the Python library does). Each Tdb file consists of one or more tables.

TDB         ::= TABLE+
TABLE       ::= OWS '[' OWS TABLEDEF OWS '%' OWS RECORD* OWS ']' OWS
TABLEDEF    ::= IDENFIFIER (RWS FIELDDEF)+ # IDENFIFIER is the tablename
FIELDDEF    ::= IDENFIFIER RWS FIELDTYPE # IDENFIFIER is the fieldname
FIELDTYPE   ::= ('bool' | 'bytes' | 'date' | 'datetime' | 'int' | 'real' | 'str') NULL?
RECORD      ::= OWS VALUE (RWS VALUE)*
VALUE       ::= BOOL | BYTES | DATE | DATETIME | INT | REAL | STR | NULL # NULL is only allowed for nullable field types
BOOL        ::= /[FfTtYyNn01]/
BYTES       ::= '(' (OWS [A-Fa-f0-9]{2})* OWS ')'
DATE        ::= /\d\d\d\d-\d\d-\d\d/  # basic ISO8601 YYYY-MM-DD format
DATETIME    ::= /\d\d\d\d-\d\d-\d\dT\d\d(\d\d(\d\d)?)?/ 
INT         ::= /[-+]?\d+/ 
REAL        ::= ... # standard or scientific notation
STR         ::= /[<][^<>]*?[>]/ # newlines allowed, and &amp; &lt; &gt; supported i.e., XML
NULL        ::= '?'
IDENFIFIER  ::= /[_\p{L}]\w{0,31}/ # Must start with a letter or underscore; may not be a built-in constant
OWS         ::= /[\s\n]*/
RWS         ::= /[\s\n]+/ # in some cases RWS is actually optional

Notes

  • Every field is not null by default and must contain a valid value of the field's type. To make a field nullable, append ? to its typename, e.g., str?; for nullable fields the value must either be one of the field's type (e.g., str) or null ?.
  • A Tdb file must contain at least one table even if it is empty, i.e., has no records.
  • A Tdb writer should always write bools as F or T; but a Tdb reader should accept any of F, f, N, n, 0, for false, and any of T, t, Y, y, 1, for true.
  • Within any .tdb file each tablename must be unique, and within each table each fieldname must be unique.
  • No tablename or fieldname (i.e., no identifier) may be the same as a built-in constant or bool value:
    bool, bytes, date, datetime, f, F, int, n, N, real, str, t, T, y, Y

Supplementary

Vim Support

If you use the vim editor, simple color syntax highlighting is available. Copy tdb.vim into your $VIM/syntax/ folder and add this line (or similar) to your .vimrc or .gvimrc file:

au BufRead,BufNewFile,BufEnter *.tdb set ft=tdb|set expandtab|set textwidth=80

tdb logo


Documentation

Overview

Tdb provides the Parse and Unmarshal functions for reading []byte slices of text in Tdb “Text DataBase” format, and the Tdb.Write and Marshal functions for writing to Tdb format.

The Parse function creates a Tdb object which stores values as type `any`, so is useful for applications that need to process generic Tdb files. Tdb data is written in Tdb format using the Tdb.Write method. However, if the Tdb file format is known, then it is best to use Marshal and Unmarshal since these use the appropriate concrete types (`bool`, `int`, `string`, and so on).

To use the Marshal and Unmarshal functions you must provide a populated (for Marshal) or unpopulated (for Unmarshal) struct. This outer struct represents a text database. The outer struct must contain one or more public (inner) fields, each of type slice of struct. Each inner field represents a database table, and each record is represented by an inner field struct.

Tdb format

Tdb “Text DataBase” format is a plain text human readable typed database storage format.

Tdb provides a superior alternative to CSV. In particular, Tdb tables are named and Tdb fields are strictly typed. Also, there is a clear distinction between field names and data values, and strings respect whitespace (including newlines) and have no problems with commas, quotes, etc. Perhaps best of all, a single Tdb file may contain one—or more—tables.

See README.md at https://github.com/mark-summerfield/tdb-go for more about the Tdb format.

Using the tdb package

Import using:

import tdb "github.com/mark-summerfield/tdb-go"

Types:

| Tdb Type |  Go Types                  |
|----------|----------------------------|
| bool     | bool                       |
| bytes    | []byte                     |
| date     | time.Time                  |
| datetime | time.Time                  |
| int      | int uint int32 uint32 etc. |
| real     | float64 float32            |
| str      | string                     |

Note that for nullable types (e.g., `bool?`, `str?`, etc.) the corresponding Go type must be a pointer (e.g., `*bool`, `*string`, etc.).

The Marshal and Unmarshal examples use these structs:

type classicDatabase struct {
	Employees   []Employee   `tdb:"emp"`
	Departments []Department `tdb:"dept"`
}

type Employee struct {
	EID        int       `tdb:"empno"`
	Name       string    `tdb:"ename"`
	Job        string    `tdb:"job"`
	ManagerID  *int      `tdb:"mgr"` // The boss doesn't have a mgr
	HireDate   time.Time `tdb:"hiredate:date"`
	Salary     float64   `tdb:"sal"`
	Commission *float64  `tdb:"comm"` // Most don't get commission
	DeptID     int       `tdb:"deptno"`
}

type Department struct {
	DID      int    `tdb:"deptno"`
	Name     string `tdb:"dname"`
	Location string `tdb:"loc"`
}

Although struct tags are used extensively here, they are only actually required for two purposes. A tag is needed if a Tdb file's table or field name is different from the corresponding struct name. And a tag is needed for time.Time fields if the field is a Tdb `date` field (since the default is `datetime`). For example, see `db1_test.go` and `csv_test.go` for structs which work fine despite having few tags.

The order of tables in a Tdb file in relation to the outer struct doesn't matter. However, the order of fields within a table must match between the Tdb file's table definition and the corresponding struct.

Naturally, you can use any structs you like that meet tdb's minimum requirements.

Index

Examples

Constants

View Source
const (
	DateFormat     = "2006-01-02"
	DateTimeFormat = "2006-01-02T15:04:05"
)

Variables

View Source
var Version string // This tdb package's version.

Functions

func Escape

func Escape(s string) string

Escape returns an XML-escaped string, i.e., where runes are replaced as follows: & → &amp;, < → &lt;, > → &gt;. See also Unescape.

func Marshal

func Marshal(db any) ([]byte, error)

Marshal converts the given struct of slices of structs to a string (as raw UTF-8-encoded bytes) in Tdb format if possible.

Each tablename is taken from the outer struct's fieldname, but this can be overridden using a tag, e.g., `tdb:"MyTableName"`. For time.Time fields use a tag of either `tdb:"date"` or `tdb:"datetime"` to specify the Tdb field type; for all other types, the Tdb type is inferred. However, if fieldnames in the Tdb text are to be different from the struct fieldnames, use tags, with the required name, e.g., `tdb:"MyFieldName"`, and for dates and datetimes with the type too, e.g., `tdb:"MyDateField:date"`, etc.

See also Tdb.Write and MarshalDecimals and Unmarshal.

Example
package main

import (
	"fmt"

	_ "embed"
	tdb "github.com/mark-summerfield/tdb-go"
	"time"
)

func main() {
	db := classicDatabase{
		Employees: []Employee{
			{7844, "TURNER", "SALESMAN", nil,
				date(1981, time.September, 8), 1500.0, nil, 30},
			{7876, "ADAMS", "CLERK", nil, date(1983, time.January, 12),
				1100.0, nil, 20},
			{7839, "KING", "PRESIDENT", nil,
				date(1981, time.November, 17), 5000.0, nil, 10},
			{7902, "FORD", "ANALYST", nil, date(1981, time.December, 3),
				3000.0, nil, 20},
		},
		Departments: []Department{
			{10, "ACCOUNTING", "NEW YORK"},
			{20, "RESEARCH", "DALLAS"},
			{30, "SALES", "CHICAGO"},
		},
	}
	c0 := 0.0
	db.Employees[0].Commission = &c0
	m0 := 7698
	db.Employees[0].ManagerID = &m0
	m1 := 7788
	db.Employees[1].ManagerID = &m1
	m3 := 7566
	db.Employees[3].ManagerID = &m3
	raw, err := tdb.Marshal(db)
	if err != nil {
		panic(err)
	}
	fmt.Println(string(raw))
}

type classicDatabase struct {
	Employees   []Employee   `tdb:"emp"`
	Departments []Department `tdb:"dept"`
}

type Employee struct {
	EID        int       `tdb:"empno"`
	Name       string    `tdb:"ename"`
	Job        string    `tdb:"job"`
	ManagerID  *int      `tdb:"mgr"`
	HireDate   time.Time `tdb:"hiredate:date"`
	Salary     float64   `tdb:"sal"`
	Commission *float64  `tdb:"comm"`
	DeptID     int       `tdb:"deptno"`
}

type Department struct {
	DID      int    `tdb:"deptno"`
	Name     string `tdb:"dname"`
	Location string `tdb:"loc"`
}

func date(year int, month time.Month, day int) time.Time {
	return time.Date(year, month, day, 0, 0, 0, 0, time.UTC)
}
Output:

[emp empno int ename str job str mgr int? hiredate date sal real comm real? deptno int
%
7844 <TURNER> <SALESMAN> 7698 1981-09-08 1500 0 30
7876 <ADAMS> <CLERK> 7788 1983-01-12 1100 ? 20
7839 <KING> <PRESIDENT> ? 1981-11-17 5000 ? 10
7902 <FORD> <ANALYST> 7566 1981-12-03 3000 ? 20
]
[dept deptno int dname str loc str
%
10 <ACCOUNTING> <NEW YORK>
20 <RESEARCH> <DALLAS>
30 <SALES> <CHICAGO>
]

func MarshalDecimals added in v0.4.0

func MarshalDecimals(db any, decimals int) ([]byte, error)

MarshalDecimals is a refinement of the Marshal function.

By default for real numbers the Marshal function outputs them using the fewest number of decimal digits necessary. In particular this means that numbers whose fractional part is 0 are output like ints (e.g., 2.0 → 2).

To control decimal output use this function. Pass a decimals value of 1-19 to use exactly that number of decimal digits; any other value means use the minimum number of decimal digits necessary (which may be none for numbers whose fractional part is 0).

See also Marshal and Unmarshal.

func Unescape

func Unescape(s string) string

Unescape accepts an XML-escaped string and returns a plain text string with no escapes, i.e., where substrings are replaced with runes as follows: &amp; → &, &lt; → <, &gt; → >. See also Escape.

func Unmarshal

func Unmarshal(data []byte, db any) error

Unmarshal reads the data from the given string (as raw UTF-8-encoded bytes) into a (pointer to a) database struct.

See also Parse and Marshal and MarshalDecimals.

Example
package main

import (
	"fmt"

	_ "embed"
	tdb "github.com/mark-summerfield/tdb-go"
	"time"
)

func main() {
	tdbText := `[emp empno int ename str job str mgr int? hiredate date sal real comm real? deptno int
%
7844 <TURNER> <SALESMAN> 7698 1981-09-08 1500 0 30
7876 <ADAMS> <CLERK> 7788 1983-01-12 1100 ? 20
7839 <KING> <PRESIDENT> ? 1981-11-17 5000 ? 10
7902 <FORD> <ANALYST> 7566 1981-12-03 3000 ? 20
]
[dept deptno int dname str loc str
%
10 <ACCOUNTING> <NEW YORK>
20 <RESEARCH> <DALLAS>
30 <SALES> <CHICAGO>
]`
	db := classicDatabase{}
	if err := tdb.Unmarshal([]byte(tdbText), &db); err != nil {
		panic(err)
	}
	fmt.Printf("%d Employees\n", len(db.Employees))
	fmt.Printf("%d Departments\n", len(db.Departments))
	president := db.Employees[2]
	managerID := -1
	if president.ManagerID != nil {
		managerID = *president.ManagerID
	}
	commission := 0.0
	if president.Commission != nil {
		commission = *president.Commission
	}
	fmt.Printf("%d %q %q %d %s %g %g %d\n", president.EID, president.Name,
		president.Job, managerID, president.HireDate.Format(tdb.DateFormat),
		president.Salary, commission, president.DeptID)
	research := db.Departments[1]
	fmt.Printf("%d %q %q\n", research.DID, research.Name, research.Location)
}

type classicDatabase struct {
	Employees   []Employee   `tdb:"emp"`
	Departments []Department `tdb:"dept"`
}

type Employee struct {
	EID        int       `tdb:"empno"`
	Name       string    `tdb:"ename"`
	Job        string    `tdb:"job"`
	ManagerID  *int      `tdb:"mgr"`
	HireDate   time.Time `tdb:"hiredate:date"`
	Salary     float64   `tdb:"sal"`
	Commission *float64  `tdb:"comm"`
	DeptID     int       `tdb:"deptno"`
}

type Department struct {
	DID      int    `tdb:"deptno"`
	Name     string `tdb:"dname"`
	Location string `tdb:"loc"`
}
Output:

4 Employees
3 Departments
7839 "KING" "PRESIDENT" -1 1981-11-17 5000 0 10
20 "RESEARCH" "DALLAS"

Types

type FieldKind added in v0.8.0

type FieldKind uint8
const (
	BoolField FieldKind = 1 << iota
	BytesField
	DateField
	DateTimeField
	IntField
	RealField
	StrField
)

func (FieldKind) String added in v0.8.0

func (me FieldKind) String() string

type MetaFieldType added in v0.8.0

type MetaFieldType struct {
	Name      string
	Kind      FieldKind
	AllowNull bool
}

type MetaTableType added in v0.8.0

type MetaTableType struct {
	Name   string
	Fields []*MetaFieldType
}

MetaTableType holds the name of a table and a slice of its fields (names and kinds)

func (*MetaTableType) AddField added in v0.8.0

func (me *MetaTableType) AddField(fieldName, typeName string) bool

func (*MetaTableType) Field added in v0.8.0

func (me *MetaTableType) Field(index int) *MetaFieldType

func (MetaTableType) Len added in v0.8.0

func (me MetaTableType) Len() int

func (MetaTableType) String added in v0.8.0

func (me MetaTableType) String() string

type Record added in v0.8.0

type Record []any

type Table added in v0.8.0

type Table struct {
	MetaTableType // table name and field names and kinds
	Records       []Record
}

func NewTable added in v0.8.0

func NewTable() Table

type Tdb added in v0.8.0

type Tdb struct {
	TableNames []string          // order of reading & writing from/to file
	Tables     map[string]*Table // key is tablename
}

func NewTdb added in v0.8.0

func NewTdb() Tdb

func Parse added in v0.8.0

func Parse(data []byte) (*Tdb, error)

Parse reads the data from the given string (as raw UTF-8-encoded bytes) and returns a Tdb object that holds all the tables and values (the values as “any“s).

See also Tdb.Write and Marshal and MarshalDecimals.

func (*Tdb) AddTable added in v0.8.0

func (me *Tdb) AddTable(table *Table)

func (*Tdb) Write added in v0.8.0

func (me *Tdb) Write(out io.Writer) error

Write writes the Tdb's tables and values to the given writer in Tdb format.

See also [WriteDecimals] and Parse.

func (*Tdb) WriteDecimals added in v0.8.0

func (me *Tdb) WriteDecimals(out io.Writer, decimals int) error

WriteDecimals is a refinement of the [Write] method that writes the Tdb's tables and values to the given writer in Tdb format.

By default for real numbers the [Write] method outputs them using the fewest number of decimal digits necessary. In particular this means that numbers whose fractional part is 0 are output like ints (e.g., 2.0 → 2).

To control decimal output use this function. Pass a decimals value of 1-19 to use exactly that number of decimal digits; any other value means use the minimum number of decimal digits necessary (which may be none for numbers whose fractional part is 0).

See also [WriteDecimals] and Parse.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL