Documentation
¶
Index ¶
- func CompactDataFiles(ctx context.Context, updateFunc func(CompactionStatus), ...) error
- func DeleteParquetFiles(partition *config.Partition, from time.Time) (rowCount int, err error)
- func PartitionMatchesPatterns(table, partition string, patterns []PartitionPattern) bool
- type ColumnSchemaChange
- type CompactionStatus
- type ConversionError
- type Converter
- type FileRootProvider
- type PartitionPattern
- type SchemaChangeError
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CompactDataFiles ¶
func CompactDataFiles(ctx context.Context, updateFunc func(CompactionStatus), patterns ...PartitionPattern) error
func DeleteParquetFiles ¶
func PartitionMatchesPatterns ¶ added in v0.2.0
func PartitionMatchesPatterns(table, partition string, patterns []PartitionPattern) bool
Types ¶
type ColumnSchemaChange ¶ added in v0.3.0
type CompactionStatus ¶
func (*CompactionStatus) BriefString ¶
func (total *CompactionStatus) BriefString() string
func (*CompactionStatus) Update ¶
func (total *CompactionStatus) Update(counts CompactionStatus)
func (*CompactionStatus) VerboseString ¶
func (total *CompactionStatus) VerboseString() string
type ConversionError ¶ added in v0.2.0
type ConversionError struct { SourceFile string BaseError error RowsAffected int64 // contains filtered or unexported fields }
func NewConversionError ¶ added in v0.2.0
func NewConversionError(msg string, rowsAffected int64, path string) *ConversionError
func (*ConversionError) Error ¶ added in v0.2.0
func (c *ConversionError) Error() string
func (*ConversionError) Merge ¶ added in v0.3.0
func (c *ConversionError) Merge(err error)
Merge adds a second error to the conversion error message.
type Converter ¶ added in v0.2.0
type Converter struct { // the partition being collected Partition *config.Partition // contains filtered or unexported fields }
Converter struct executes all the conversions for a single collection it therefore has a unique execution id, and will potentially convert of multiple JSONL files each file is assumed to have the filename format <execution_id>_<chunkNumber>.jsonl so when new input files are available, we simply store the chunk number
func NewParquetConverter ¶ added in v0.2.0
func (*Converter) AddChunk ¶ added in v0.2.0
AddChunk adds a new chunk to the list of chunks to be processed if this is the first chunk, determine if we have a full conversionSchema yet and if not infer from the chunk signal the scheduler that `chunks are available
func (*Converter) InferSchemaForJSONLFile ¶ added in v0.3.0
func (w *Converter) InferSchemaForJSONLFile(filePath string) (*schema.TableSchema, error)
func (*Converter) WaitForConversions ¶ added in v0.2.0
WaitForConversions waits for all jobs to be processed or for the context to be cancelled
type FileRootProvider ¶
type FileRootProvider struct {
// contains filtered or unexported fields
}
FileRootProvider provides a unique file root for parquet files based on the current time to the nanosecond. If multiple files are created in the same nanosecond, the provider will increment the time by a nanosecond to ensure the file root is unique.
func (*FileRootProvider) GetFileRoot ¶
func (p *FileRootProvider) GetFileRoot() string
GetFileRoot returns a unique file root for a parquet file format is "data_<timestamp>_<microseconds>"
type PartitionPattern ¶
PartitionPattern represents a pattern used to match partitions. It consists of a table pattern and a partition pattern, both of which are used to match a given table and partition name.
func NewPartitionPattern ¶
func NewPartitionPattern(partition *config.Partition) PartitionPattern
type SchemaChangeError ¶ added in v0.3.0
type SchemaChangeError struct {
ChangedColumns []ColumnSchemaChange
}
func (*SchemaChangeError) Error ¶ added in v0.3.0
func (e *SchemaChangeError) Error() string