text

package
v1.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 29, 2025 License: MIT Imports: 11 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Direction

type Direction int

Direction represents text direction

const (
	// LTR (Left-to-Right) for Latin, Cyrillic, etc.
	LTR Direction = iota
	// RTL (Right-to-Left) for Arabic, Hebrew, etc.
	RTL
	// Neutral for numbers, punctuation, etc.
	Neutral
)

func DetectDirection

func DetectDirection(text string) Direction

DetectDirection detects the dominant text direction of a string based on Unicode character properties

func GetCharDirection

func GetCharDirection(r rune) Direction

GetCharDirection returns the direction of a single character based on Unicode character properties

func (Direction) String

func (d Direction) String() string

String returns string representation of direction

type Extractor

type Extractor struct {
	// contains filtered or unexported fields
}

Extractor extracts text from content streams

func NewExtractor

func NewExtractor() *Extractor

NewExtractor creates a new text extractor

func (*Extractor) Extract

func (e *Extractor) Extract(operations []contentstream.Operation) ([]TextFragment, error)

Extract extracts text from content stream operations

func (*Extractor) ExtractFromBytes

func (e *Extractor) ExtractFromBytes(data []byte) ([]TextFragment, error)

ExtractFromBytes parses and extracts text from raw content stream data

func (*Extractor) GetFonts

func (e *Extractor) GetFonts() map[string]*font.Font

GetFonts returns the fonts registered in this extractor Useful for debugging font loading and ToUnicode CMap issues

func (*Extractor) GetFragments

func (e *Extractor) GetFragments() []TextFragment

GetFragments returns all text fragments

func (*Extractor) GetText

func (e *Extractor) GetText() string

GetText returns all extracted text concatenated with smart spacing and RTL support

func (*Extractor) RegisterFont

func (e *Extractor) RegisterFont(name, baseFont, subtype string)

RegisterFont registers a font for use during extraction

func (*Extractor) RegisterFontsFromPage

func (e *Extractor) RegisterFontsFromPage(page *pages.Page, resolver func(core.IndirectRef) (core.Object, error)) error

RegisterFontsFromPage parses and registers all fonts from a page's resources This is the recommended way to prepare the extractor for text extraction from a page

func (*Extractor) RegisterFontsFromResources

func (e *Extractor) RegisterFontsFromResources(resources core.Dict, resolver func(core.IndirectRef) (core.Object, error)) error

RegisterFontsFromResources parses and registers all fonts from a resources dictionary This is useful when working with page resources directly

func (*Extractor) RegisterParsedFont

func (e *Extractor) RegisterParsedFont(name string, f *font.Font)

RegisterParsedFont registers a pre-parsed font for use during extraction This is useful when you have already parsed the font with its ToUnicode CMap

type TextFragment

type TextFragment struct {
	Text      string
	X, Y      float64
	Width     float64
	Height    float64
	FontName  string
	FontSize  float64
	Direction Direction // Text direction (LTR, RTL, Neutral)
}

TextFragment represents a piece of extracted text with position

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL