schema

package
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 26, 2026 License: Apache-2.0 Imports: 9 Imported by: 0

Documentation

Overview

Package schema reads Paimon table schema metadata.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ToArrowField

func ToArrowField(f DataField) (arrow.Field, error)

ToArrowField converts a DataField to an Arrow field.

func ToArrowSchema

func ToArrowSchema(fields []DataField) (*arrow.Schema, error)

ToArrowSchema converts a slice of DataFields to an Arrow schema.

func TypeTag

func TypeTag(dt DataType) string

TypeTag returns a simple uppercase tag for a DataType (used with binaryrow.GetField).

Types

type DataField

type DataField struct {
	ID          int      `json:"id"`
	Name        string   `json:"name"`
	Type        DataType `json:"type"`
	Description string   `json:"description,omitempty"`
}

DataField is a named, typed column in a table schema.

ID is the stable field identifier used for schema evolution — it never changes even if the field is renamed or reordered. Name is the column name as visible to callers.

type DataType

type DataType struct {
	// Type is the base type name in uppercase, e.g. "INT", "BIGINT", "ARRAY", "MAP".
	// For parameterised types the raw string is preserved here, e.g. "VARCHAR(255)".
	Type     string
	Nullable bool
	// For DECIMAL
	Precision int
	Scale     int
	// For CHAR / VARCHAR / BINARY / VARBINARY — extracted from e.g. "VARCHAR(255)"
	Length int
	// For ARRAY / MULTISET
	ElementType *DataType
	// For MAP
	KeyType   *DataType
	ValueType *DataType
	// For ROW
	Fields []DataField
}

DataType represents a Paimon field type.

Paimon serialises simple types as a plain JSON string (e.g. "INT NOT NULL", "VARCHAR(255)") and complex types as a JSON object with a "type" key (e.g. {"type":"ARRAY NOT NULL","element":"INT"}). Nullability is encoded via the absence/presence of "NOT NULL" in the type string — there is no separate "nullable" boolean field in the wire format.

func (*DataType) UnmarshalJSON

func (dt *DataType) UnmarshalJSON(data []byte) error

UnmarshalJSON implements json.Unmarshaler. Handles both the string form ("INT NOT NULL") and the object form ({"type":"ARRAY","element":"BIGINT"}).

type Manager

type Manager struct {
	// contains filtered or unexported fields
}

Manager reads schema files from a table's schema/ directory.

func NewManager

func NewManager(tableRoot string, io fileio.FileIO) *Manager

NewManager creates a SchemaManager for the given table root path.

func (*Manager) Latest

func (m *Manager) Latest(ctx context.Context) (*TableSchema, error)

Latest returns the schema with the highest ID.

func (*Manager) Read

func (m *Manager) Read(ctx context.Context, id int64) (*TableSchema, error)

Read loads a specific schema by ID.

type TableSchema

type TableSchema struct {
	// Version is the schema-file format version (0, 1, or 2).
	Version int `json:"version"`
	// ID is the schema's monotonically increasing identifier.
	ID int64 `json:"id"`
	// Fields lists all columns in declaration order.
	Fields []DataField `json:"fields"`
	// HighestFieldID is the largest field ID ever assigned; used for schema evolution.
	HighestFieldID int `json:"highestFieldId"`
	// PartitionKeys are the names of the partition columns (empty for unpartitioned tables).
	PartitionKeys []string `json:"partitionKeys"`
	// PrimaryKeys are the names of the primary key columns (empty for append-only tables).
	PrimaryKeys []string `json:"primaryKeys"`
	// Options holds the table's WITH(...) properties, e.g. "merge-engine": "deduplicate".
	Options map[string]string `json:"options"`
	// Comment is an optional human-readable table description.
	Comment string `json:"comment,omitempty"`
	// TimeMillis is the epoch-millisecond timestamp when this schema was created.
	TimeMillis int64 `json:"timeMillis,omitempty"`
}

TableSchema is the JSON representation of a Paimon schema file (stored at schema/<id> inside the table directory).

Version indicates the Paimon schema-file format version; defaults that depend on version (e.g. file format) are applied automatically when the schema is loaded. ID is the monotonically increasing schema ID; it advances each time the schema is altered via ALTER TABLE.

Fields lists all columns in declaration order. PartitionKeys and PrimaryKeys are field names (not IDs); use TableSchema.PartitionFields and TableSchema.PrimaryKeyFields for typed access.

Options mirrors the table's WITH(...) properties and is keyed by the Paimon option name (e.g. "merge-engine", "file.format").

func (*TableSchema) FieldByName

func (s *TableSchema) FieldByName(name string) (DataField, bool)

FieldByName returns the DataField with the given name, or (DataField{}, false).

func (*TableSchema) FileFormat

func (s *TableSchema) FileFormat() string

FileFormat returns the data file format, defaulting to "orc" for v<=2 schemas.

func (*TableSchema) IsPrimaryKeyTable

func (s *TableSchema) IsPrimaryKeyTable() bool

IsPrimaryKeyTable returns true if the table has primary keys defined.

func (*TableSchema) MergeEngine

func (s *TableSchema) MergeEngine() string

MergeEngine returns the merge engine name from table options, defaulting to "deduplicate" if not set.

func (*TableSchema) PartitionFields

func (s *TableSchema) PartitionFields() []DataField

PartitionFields returns the DataFields for the partition keys in order.

func (*TableSchema) PrimaryKeyFields

func (s *TableSchema) PrimaryKeyFields() []DataField

PrimaryKeyFields returns the DataFields for the primary keys in order.

func (*TableSchema) PrimaryKeyIndices

func (s *TableSchema) PrimaryKeyIndices() []int

PrimaryKeyIndices returns the integer positions of each primary key field within the TableSchema.Fields slice.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL