Documentation ¶
Index ¶
- func MakeDocPageLocations(b *flatbuffers.Builder, ppos []OffsetBBox) []byte
- func MakeDocPositions(b *flatbuffers.Builder, doc DocPositions) []byte
- func MakeSerialBlevePdf(b *flatbuffers.Builder, spi SerialBlevePdf) []byte
- func MakeTextLocation(b *flatbuffers.Builder, loc OffsetBBox) []byte
- func WriteSerialBlevePdf(spi SerialBlevePdf) []byte
- type DocPositions
- type HashIndexPathDoc
- type OffsetBBox
- type SerialBlevePdf
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func MakeDocPageLocations ¶
func MakeDocPageLocations(b *flatbuffers.Builder, ppos []OffsetBBox) []byte
MakeDocPageLocations returns a flatbuffers serialized byte array for `ppos`.
func MakeDocPositions ¶
func MakeDocPositions(b *flatbuffers.Builder, doc DocPositions) []byte
MakeDocPositions returns a flatbuffers serialized byte array for `doc`.
func MakeSerialBlevePdf ¶
func MakeSerialBlevePdf(b *flatbuffers.Builder, spi SerialBlevePdf) []byte
MakeSerialBlevePdf returns a flatbuffers serialized byte array for `spi`.
func MakeTextLocation ¶
func MakeTextLocation(b *flatbuffers.Builder, loc OffsetBBox) []byte
MakeTextLocation returns a flatbuffers serialized byte array for `loc`.
func WriteSerialBlevePdf ¶
func WriteSerialBlevePdf(spi SerialBlevePdf) []byte
WriteSerialBlevePdf converts `spi` into a byte array.
Types ¶
type DocPositions ¶
type DocPositions struct { Path string // Path of input PDF file. DocIdx uint64 // Index into blevePdf.fileList. PagePositions [][]OffsetBBox // PagePositions[i] = doc.pagePositions[doc.pageNums[i]].offsetBBoxes PageNums []uint32 // 1-offset page numbers of entries. PageTexts []string // Extracted page text of entries. }
DocPositions is used to serialize a doclib.DocPositions.
table DocPositions { path: string; doc_idx: uint64; page_dpl: [locations.PagePositions]; page_nums: [uint32]; page_texts: [string]; }
func (DocPositions) String ¶
func (doc DocPositions) String() string
String returns a text description of `doc`.
type HashIndexPathDoc ¶
type HashIndexPathDoc struct { Hash string Index uint64 Path string Doc DocPositions }
HashIndexPathDoc is used for serializing a doclib.BlevePdf. They key+values of the maps in the BlevePdf are stored in []HashIndexPathDoc.
table HashIndexPathDoc { hash: string; index: uint64; path: string; doc: DocPositions; }
type OffsetBBox ¶
type OffsetBBox struct { Offset uint32 // Offset of the text fragment in extracted page text. Llx, Lly, Urx, Ury float32 // Bounding box of fragment on PDF page. }
OffsetBBox provides a mapping between the location of a piece of text on a PDF page and the offset of that piece of text in the text extracted from the PDF page. The text extracted from PDF pages is sent to bleve for indexing. BBox() is used to map the results of bleve searches (offsets in the extracted text) back to PDF contents. (Members need to be public because they are accessed by the doclib package.
func ReadDocPageLocations ¶
func ReadDocPageLocations(buf []byte) ([]OffsetBBox, error)
func ReadTextLocation ¶
func ReadTextLocation(buf []byte) OffsetBBox
func (OffsetBBox) BBox ¶
func (t OffsetBBox) BBox() model.PdfRectangle
BBox returns `t` as a UniDoc rectangle. This is convenient for drawing bounding rectangles around text in a PDF file.
func (OffsetBBox) Equals ¶
func (t OffsetBBox) Equals(u OffsetBBox) bool
Equals returns true if `t` has the same text interval and bounding box as `u`.
type SerialBlevePdf ¶
type SerialBlevePdf struct { NumFiles uint32 NumPages uint32 HIPDs []HashIndexPathDoc }
SerialBlevePdf is for serializing and deserializing doclib.BlevePdf. It corresponds to the following flatbuffers schema.
table PdfIndex { num_files: uint32; num_pages: uint32; hipd: [HashIndexPathDoc]; }
func ReadSerialBlevePdf ¶
func ReadSerialBlevePdf(buf []byte) (SerialBlevePdf, error)
ReadSerialBlevePdf converts byte array `b` into a SerialBlevePdf. Write round trip tests. !@#$