Documentation
¶
Overview ¶
Package schema reads Paimon table schema metadata.
Index ¶
- func ToArrowField(f DataField) (arrow.Field, error)
- func ToArrowSchema(fields []DataField) (*arrow.Schema, error)
- func TypeTag(dt DataType) string
- type DataField
- type DataType
- type Manager
- type TableSchema
- func (s *TableSchema) FieldByName(name string) (DataField, bool)
- func (s *TableSchema) FileFormat() string
- func (s *TableSchema) IsPrimaryKeyTable() bool
- func (s *TableSchema) MergeEngine() string
- func (s *TableSchema) PartitionFields() []DataField
- func (s *TableSchema) PrimaryKeyFields() []DataField
- func (s *TableSchema) PrimaryKeyIndices() []int
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ToArrowField ¶
ToArrowField converts a DataField to an Arrow field.
func ToArrowSchema ¶
ToArrowSchema converts a slice of DataFields to an Arrow schema.
Types ¶
type DataField ¶
type DataField struct {
ID int `json:"id"`
Name string `json:"name"`
Type DataType `json:"type"`
Description string `json:"description,omitempty"`
}
DataField is a named, typed column in a table schema.
ID is the stable field identifier used for schema evolution — it never changes even if the field is renamed or reordered. Name is the column name as visible to callers.
type DataType ¶
type DataType struct {
// Type is the base type name in uppercase, e.g. "INT", "BIGINT", "ARRAY", "MAP".
// For parameterised types the raw string is preserved here, e.g. "VARCHAR(255)".
Type string
Nullable bool
// For DECIMAL
Precision int
Scale int
// For CHAR / VARCHAR / BINARY / VARBINARY — extracted from e.g. "VARCHAR(255)"
Length int
// For ARRAY / MULTISET
ElementType *DataType
// For MAP
KeyType *DataType
ValueType *DataType
// For ROW
Fields []DataField
}
DataType represents a Paimon field type.
Paimon serialises simple types as a plain JSON string (e.g. "INT NOT NULL", "VARCHAR(255)") and complex types as a JSON object with a "type" key (e.g. {"type":"ARRAY NOT NULL","element":"INT"}). Nullability is encoded via the absence/presence of "NOT NULL" in the type string — there is no separate "nullable" boolean field in the wire format.
func (*DataType) UnmarshalJSON ¶
UnmarshalJSON implements json.Unmarshaler. Handles both the string form ("INT NOT NULL") and the object form ({"type":"ARRAY","element":"BIGINT"}).
type Manager ¶
type Manager struct {
// contains filtered or unexported fields
}
Manager reads schema files from a table's schema/ directory.
func NewManager ¶
NewManager creates a SchemaManager for the given table root path.
type TableSchema ¶
type TableSchema struct {
// Version is the schema-file format version (0, 1, or 2).
Version int `json:"version"`
// ID is the schema's monotonically increasing identifier.
ID int64 `json:"id"`
// Fields lists all columns in declaration order.
Fields []DataField `json:"fields"`
// HighestFieldID is the largest field ID ever assigned; used for schema evolution.
HighestFieldID int `json:"highestFieldId"`
// PartitionKeys are the names of the partition columns (empty for unpartitioned tables).
PartitionKeys []string `json:"partitionKeys"`
// PrimaryKeys are the names of the primary key columns (empty for append-only tables).
PrimaryKeys []string `json:"primaryKeys"`
// Options holds the table's WITH(...) properties, e.g. "merge-engine": "deduplicate".
Options map[string]string `json:"options"`
// Comment is an optional human-readable table description.
Comment string `json:"comment,omitempty"`
// TimeMillis is the epoch-millisecond timestamp when this schema was created.
TimeMillis int64 `json:"timeMillis,omitempty"`
}
TableSchema is the JSON representation of a Paimon schema file (stored at schema/<id> inside the table directory).
Version indicates the Paimon schema-file format version; defaults that depend on version (e.g. file format) are applied automatically when the schema is loaded. ID is the monotonically increasing schema ID; it advances each time the schema is altered via ALTER TABLE.
Fields lists all columns in declaration order. PartitionKeys and PrimaryKeys are field names (not IDs); use TableSchema.PartitionFields and TableSchema.PrimaryKeyFields for typed access.
Options mirrors the table's WITH(...) properties and is keyed by the Paimon option name (e.g. "merge-engine", "file.format").
func (*TableSchema) FieldByName ¶
func (s *TableSchema) FieldByName(name string) (DataField, bool)
FieldByName returns the DataField with the given name, or (DataField{}, false).
func (*TableSchema) FileFormat ¶
func (s *TableSchema) FileFormat() string
FileFormat returns the data file format, defaulting to "orc" for v<=2 schemas.
func (*TableSchema) IsPrimaryKeyTable ¶
func (s *TableSchema) IsPrimaryKeyTable() bool
IsPrimaryKeyTable returns true if the table has primary keys defined.
func (*TableSchema) MergeEngine ¶
func (s *TableSchema) MergeEngine() string
MergeEngine returns the merge engine name from table options, defaulting to "deduplicate" if not set.
func (*TableSchema) PartitionFields ¶
func (s *TableSchema) PartitionFields() []DataField
PartitionFields returns the DataFields for the partition keys in order.
func (*TableSchema) PrimaryKeyFields ¶
func (s *TableSchema) PrimaryKeyFields() []DataField
PrimaryKeyFields returns the DataFields for the primary keys in order.
func (*TableSchema) PrimaryKeyIndices ¶
func (s *TableSchema) PrimaryKeyIndices() []int
PrimaryKeyIndices returns the integer positions of each primary key field within the TableSchema.Fields slice.