parquet

package

v0.3.2 Latest Latest Go to latest Published: Apr 25, 2025 License: AGPL-3.0 Imports: 25 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/turbot/tailpipe

Links

Open Source Insights

Documentation ¶

Index ¶

func CompactDataFiles(ctx context.Context, updateFunc func(CompactionStatus), ...) error
func DeleteParquetFiles(partition *config.Partition, from time.Time) (rowCount int, err error)
func PartitionMatchesPatterns(table, partition string, patterns []PartitionPattern) bool
type ColumnSchemaChange
type CompactionStatus
type ConversionError
- func NewConversionError(msg string, rowsAffected int64, path string) *ConversionError
- func (c *ConversionError) Error() string
- func (c *ConversionError) Merge(err error)
type Converter
- func NewParquetConverter(ctx context.Context, cancel context.CancelFunc, executionId string, ...) (*Converter, error)
type FileRootProvider
- func (p *FileRootProvider) GetFileRoot() string
type PartitionPattern
- func NewPartitionPattern(partition *config.Partition) PartitionPattern
type SchemaChangeError
- func (e *SchemaChangeError) Error() string

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func CompactDataFiles ¶

func CompactDataFiles(ctx context.Context, updateFunc func(CompactionStatus), patterns ...PartitionPattern) error

func DeleteParquetFiles ¶

func DeleteParquetFiles(partition *config.Partition, from time.Time) (rowCount int, err error)

func PartitionMatchesPatterns ¶ added in v0.2.0

func PartitionMatchesPatterns(table, partition string, patterns []PartitionPattern) bool

Types ¶

type ColumnSchemaChange ¶ added in v0.3.0

type ColumnSchemaChange struct {
	Name    string
	OldType string
	NewType string
}

type CompactionStatus ¶

type CompactionStatus struct {
	Source      int
	Dest        int
	Uncompacted int
}

func (*CompactionStatus) BriefString ¶

func (total *CompactionStatus) BriefString() string

func (*CompactionStatus) Update ¶

func (total *CompactionStatus) Update(counts CompactionStatus)

func (*CompactionStatus) VerboseString ¶

func (total *CompactionStatus) VerboseString() string

type ConversionError ¶ added in v0.2.0

type ConversionError struct {
	SourceFile   string
	BaseError    error
	RowsAffected int64
	// contains filtered or unexported fields
}

func NewConversionError ¶ added in v0.2.0

func NewConversionError(msg string, rowsAffected int64, path string) *ConversionError

func (*ConversionError) Error ¶ added in v0.2.0

func (c *ConversionError) Error() string

func (*ConversionError) Merge ¶ added in v0.3.0

func (c *ConversionError) Merge(err error)

Merge adds a second error to the conversion error message.

type Converter ¶ added in v0.2.0

type Converter struct {

	// the partition being collected
	Partition *config.Partition
	// contains filtered or unexported fields
}

Converter struct executes all the conversions for a single collection it therefore has a unique execution id, and will potentially convert of multiple JSONL files each file is assumed to have the filename format <execution_id>_<chunkNumber>.jsonl so when new input files are available, we simply store the chunk number

func NewParquetConverter ¶ added in v0.2.0

func NewParquetConverter(ctx context.Context, cancel context.CancelFunc, executionId string, partition *config.Partition, sourceDir string, tableSchema *schema.TableSchema, statusFunc func(int64, int64, ...error)) (*Converter, error)

func (*Converter) AddChunk ¶ added in v0.2.0

func (w *Converter) AddChunk(executionId string, chunk int32) error

AddChunk adds a new chunk to the list of chunks to be processed if this is the first chunk, determine if we have a full conversionSchema yet and if not infer from the chunk signal the scheduler that `chunks are available

func (*Converter) Close ¶ added in v0.2.0

func (w *Converter) Close()

func (*Converter) InferSchemaForJSONLFile ¶ added in v0.3.0

func (w *Converter) InferSchemaForJSONLFile(filePath string) (*schema.TableSchema, error)

func (*Converter) WaitForConversions ¶ added in v0.2.0

func (w *Converter) WaitForConversions(ctx context.Context)

WaitForConversions waits for all jobs to be processed or for the context to be cancelled

type FileRootProvider ¶

type FileRootProvider struct {
	// contains filtered or unexported fields
}

FileRootProvider provides a unique file root for parquet files based on the current time to the nanosecond. If multiple files are created in the same nanosecond, the provider will increment the time by a nanosecond to ensure the file root is unique.

func (*FileRootProvider) GetFileRoot ¶

func (p *FileRootProvider) GetFileRoot() string

GetFileRoot returns a unique file root for a parquet file format is "data_<timestamp>_<microseconds>"

type PartitionPattern ¶

type PartitionPattern struct {
	Table     string
	Partition string
}

PartitionPattern represents a pattern used to match partitions. It consists of a table pattern and a partition pattern, both of which are used to match a given table and partition name.

func NewPartitionPattern ¶

func NewPartitionPattern(partition *config.Partition) PartitionPattern

type SchemaChangeError ¶ added in v0.3.0

type SchemaChangeError struct {
	ChangedColumns []ColumnSchemaChange
}

func (*SchemaChangeError) Error ¶ added in v0.3.0

func (e *SchemaChangeError) Error() string

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL