extract

package

v1.1.64 Latest Latest Go to latest Published: May 1, 2026 License: MIT Imports: 19 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/topcheer/ggcode

Links

Open Source Insights

Documentation ¶

Overview ¶

Package extract provides text extraction from binary document formats. Supported formats: PDF, Office (docx/xlsx/pptx), OpenDocument (odt/ods/odp), EPUB, RTF, archives (zip/tar/tar.gz/tar.bz2/tar.xz), iWork (pages/numbers/key), SVG.

Index ¶

func IsDocumentFile(path string) bool
type Extractor
type Registry
- func (r *Registry) Get(ext string) Extractor
- func (r *Registry) Register(ext string, e Extractor)
type TextResult
- func Extract(path string, data []byte) (TextResult, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func IsDocumentFile ¶

func IsDocumentFile(path string) bool

IsDocumentFile checks if a file path looks like a supported document format.

Types ¶

type Extractor ¶

type Extractor interface {
	Extract(data []byte) (TextResult, error)
	Format() string
}

Extractor extracts text from a binary document format.

type Registry ¶

type Registry struct {
	// contains filtered or unexported fields
}

Registry maps file extensions to Extractor instances.

func (*Registry) Get ¶

func (r *Registry) Get(ext string) Extractor

Get returns the extractor for the given extension, or nil.

func (*Registry) Register ¶

func (r *Registry) Register(ext string, e Extractor)

Register adds an extractor for the given extension (e.g. ".pdf").

type TextResult ¶

type TextResult struct {
	Text   string // extracted plain text
	Pages  int    // page/slide count (0 if not applicable)
	Format string // format name: "pdf", "docx", "zip", etc.
}

TextResult holds extracted text and metadata about the source document.

func Extract ¶

func Extract(path string, data []byte) (TextResult, error)

Extract extracts text from data based on file extension. Returns an error if the format is not supported or extraction fails.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL