Documentation
¶
Index ¶
- type Direction
- type Extractor
- func (e *Extractor) Extract(operations []contentstream.Operation) ([]TextFragment, error)
- func (e *Extractor) ExtractFromBytes(data []byte) ([]TextFragment, error)
- func (e *Extractor) GetFonts() map[string]*font.Font
- func (e *Extractor) GetFragments() []TextFragment
- func (e *Extractor) GetText() string
- func (e *Extractor) RegisterFont(name, baseFont, subtype string)
- func (e *Extractor) RegisterFontsFromPage(page *pages.Page, resolver func(core.IndirectRef) (core.Object, error)) error
- func (e *Extractor) RegisterFontsFromResources(resources core.Dict, resolver func(core.IndirectRef) (core.Object, error)) error
- func (e *Extractor) RegisterParsedFont(name string, f *font.Font)
- type TextFragment
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Direction ¶
type Direction int
Direction represents text direction
func DetectDirection ¶
DetectDirection detects the dominant text direction of a string based on Unicode character properties
func GetCharDirection ¶
GetCharDirection returns the direction of a single character based on Unicode character properties
type Extractor ¶
type Extractor struct {
// contains filtered or unexported fields
}
Extractor extracts text from content streams
func (*Extractor) Extract ¶
func (e *Extractor) Extract(operations []contentstream.Operation) ([]TextFragment, error)
Extract extracts text from content stream operations
func (*Extractor) ExtractFromBytes ¶
func (e *Extractor) ExtractFromBytes(data []byte) ([]TextFragment, error)
ExtractFromBytes parses and extracts text from raw content stream data
func (*Extractor) GetFonts ¶
GetFonts returns the fonts registered in this extractor Useful for debugging font loading and ToUnicode CMap issues
func (*Extractor) GetFragments ¶
func (e *Extractor) GetFragments() []TextFragment
GetFragments returns all text fragments
func (*Extractor) GetText ¶
GetText returns all extracted text concatenated with smart spacing and RTL support
func (*Extractor) RegisterFont ¶
RegisterFont registers a font for use during extraction
func (*Extractor) RegisterFontsFromPage ¶
func (e *Extractor) RegisterFontsFromPage(page *pages.Page, resolver func(core.IndirectRef) (core.Object, error)) error
RegisterFontsFromPage parses and registers all fonts from a page's resources This is the recommended way to prepare the extractor for text extraction from a page
func (*Extractor) RegisterFontsFromResources ¶
func (e *Extractor) RegisterFontsFromResources(resources core.Dict, resolver func(core.IndirectRef) (core.Object, error)) error
RegisterFontsFromResources parses and registers all fonts from a resources dictionary This is useful when working with page resources directly