parquet

package
v0.10.3-0...-2c55aae Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 28, 2025 License: Apache-2.0 Imports: 23 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	BitMask        = [8]byte{1, 2, 4, 8, 16, 32, 64, 128}
	FlippedBitMask = [8]byte{254, 253, 251, 247, 239, 223, 191, 127}
)

Functions

func ConvertToArrowSchema

func ConvertToArrowSchema(schema *schemapb.CollectionSchema, useNullType bool) (*arrow.Schema, error)

This method is used only by import util and related tests. Returned arrow.Schema doesn't include function output fields.

func CreateFieldReaders

func CreateFieldReaders(ctx context.Context, fileReader *pqarrow.FileReader, schema *schemapb.CollectionSchema) (map[int64]*FieldReader, error)

func IsValidSparseVectorSchema

func IsValidSparseVectorSchema(arrowType arrow.DataType) (bool, bool)

This method returns two booleans The first boolean value means the arrowType is a valid sparse vector schema The second boolean value: true means the sparse vector is stored as JSON-format string, false means the sparse vector is stored as parquet struct

func NewReader

func NewReader(ctx context.Context, cm storage.ChunkManager, schema *schemapb.CollectionSchema, path string, bufferSize int) (*reader, error)

func ReadArrayData

func ReadArrayData(pcr *FieldReader, count int64) (any, error)

func ReadBinaryData

func ReadBinaryData(pcr *FieldReader, count int64) (any, error)

func ReadBoolArrayData

func ReadBoolArrayData(pcr *FieldReader, count int64) (any, error)

func ReadBoolData

func ReadBoolData(pcr *FieldReader, count int64) (any, error)

func ReadIntegerOrFloatArrayData

func ReadIntegerOrFloatArrayData[T constraints.Integer | constraints.Float](pcr *FieldReader, count int64) (any, error)

func ReadIntegerOrFloatData

func ReadIntegerOrFloatData[T constraints.Integer | constraints.Float](pcr *FieldReader, count int64) (any, error)

func ReadJSONData

func ReadJSONData(pcr *FieldReader, count int64) (any, error)

func ReadNullableArrayData

func ReadNullableArrayData(pcr *FieldReader, count int64) (any, []bool, error)

func ReadNullableBoolArrayData

func ReadNullableBoolArrayData(pcr *FieldReader, count int64) (any, []bool, error)

func ReadNullableBoolData

func ReadNullableBoolData(pcr *FieldReader, count int64) (any, []bool, error)

func ReadNullableIntegerOrFloatArrayData

func ReadNullableIntegerOrFloatArrayData[T constraints.Integer | constraints.Float](pcr *FieldReader, count int64) (any, []bool, error)

func ReadNullableIntegerOrFloatData

func ReadNullableIntegerOrFloatData[T constraints.Integer | constraints.Float](pcr *FieldReader, count int64) (any, []bool, error)

func ReadNullableJSONData

func ReadNullableJSONData(pcr *FieldReader, count int64) (any, []bool, error)

func ReadNullableStringArrayData

func ReadNullableStringArrayData(pcr *FieldReader, count int64) (any, []bool, error)

func ReadNullableStringData

func ReadNullableStringData(pcr *FieldReader, count int64) (any, []bool, error)

func ReadNullableVarcharData

func ReadNullableVarcharData(pcr *FieldReader, count int64) (any, []bool, error)

func ReadSparseFloatVectorData

func ReadSparseFloatVectorData(pcr *FieldReader, count int64) (any, error)

func ReadStringArrayData

func ReadStringArrayData(pcr *FieldReader, count int64) (any, error)

func ReadStringData

func ReadStringData(pcr *FieldReader, count int64) (any, error)

func ReadStructData

func ReadStructData(pcr *FieldReader, count int64) ([]map[string]arrow.Array, error)

This method returns a []map[string]arrow.Array map[string]arrow.Array represents a struct For example 1:

  struct {
	 name string
     age  int
  }

The ReadStructData() will return a list like:

  [
	 {"name": ["a", "b", "c"], "age": [4, 5, 6]},
     {"name": ["e", "f"], "age": [7, 8]}
  ]

Value type of "name" is array.String, value type of "age" is array.Int32 The length of the list is equal to the length of chunked.Chunks()

For sparse vector, the map[string]arrow.Array is like {"indices": array.List, "values": array.List} For example 2:

  struct {
	 indices []uint32
     values  []float32
  }

The ReadStructData() will return a list like:

  [
	 {"indices": [[1, 2, 3], [4, 5], [6, 7]], "values": [[0.1, 0.2, 0.3], [0.4, 0.5], [0.6, 0.7]]},
     {"indices": [[8], [9, 10]], "values": [[0.8], [0.9, 1.0]]}
  ]

Value type of "indices" is array.List, element type is array.Uint32 Value type of "values" is array.List, element type is array.Float32 The length of the list is equal to the length of chunked.Chunks()

Note: now the ReadStructData() is used by SparseVector type and SparseVector is not nullable, create a new method ReadNullableStructData() if we have nullable struct type in future.

func ReadVarcharData

func ReadVarcharData(pcr *FieldReader, count int64) (any, error)

func WrapTypeErr

func WrapTypeErr(expect string, actual string, field *schemapb.FieldSchema) error

Types

type FieldReader

type FieldReader struct {
	// contains filtered or unexported fields
}

func NewFieldReader

func NewFieldReader(ctx context.Context, reader *pqarrow.FileReader, columnIndex int, field *schemapb.FieldSchema) (*FieldReader, error)

func (*FieldReader) Close

func (c *FieldReader) Close()

func (*FieldReader) Next

func (c *FieldReader) Next(count int64) (any, any, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL