poppler

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 9, 2023 License: GPL-2.0 Imports: 6 Imported by: 3

README

go-poppler

A simple wrapper for the Poppler PDF lib for Golang.

This go module exports a limited subset of Popplers functions, specifically only those needed to extract text from PDFs. It drops support for images in favor of also dropping the libcairo dependency.

Install dependencies

In order to use this module you need to install two packages. In Debian/Ubuntu this is done by:

# Debian-based OS
apt-get install libpoppler-glib-dev

Usage

go get github.com/johbar/go-poppler

Improvements and performance considerations

This is a fork of timsat/go-poppler which is derived of cheggaaa/go-poppler.

This fork fixes a couple of memory leaks that the upstream libs included. In addition it uses finalizers as a safeguard against memory leaks. This enables you not to call document.Close() and page.Close() but let the GC do the clean-up work instead.

This might add a significant memory overhead in long-running high-throughput processes (like a web service or batch processor), when a lot of Poppler objects in unmanaged/off-heap memory are being created. They are evicted by the finalizer but the GC might run too late to prevent a OOM as it doesn't take these into account when it schedules the next cycle.

One advantage of relying on finalizes solely might be an improved CPU utilization because they run on their own goroutine. So your main routine doesn't need to handle the clean-up. (But you can archive this in many other ways, I guess, e.g. by writing your own goroutine and using channels.)

Tl;dr: Be careful when relying on finalizers; do some (load) tests and watch your RAM!

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Version

func Version() string

Types

type Document

type Document struct {
	// contains filtered or unexported fields
}

func Load

func Load(data []byte) (doc *Document, err error)

func Open

func Open(filename string) (doc *Document, err error)

func (*Document) Close

func (d *Document) Close()

Close releases memory allocated by Poppler

func (*Document) GetNAttachments

func (d *Document) GetNAttachments() int

func (*Document) GetNPages

func (d *Document) GetNPages() int

func (*Document) GetPage

func (d *Document) GetPage(i int) (page *Page)

func (*Document) HasAttachments

func (d *Document) HasAttachments() bool

func (*Document) Info

func (d *Document) Info() DocumentInfo

type DocumentInfo

type DocumentInfo struct {
	PdfVersion, Title, Author, Subject, KeyWords, Creator, Producer, Metadata string
	CreationDate, ModificationDate, Pages                                     int
	IsLinearized                                                              bool
}

type Page

type Page struct {
	// contains filtered or unexported fields
}

func (*Page) Close

func (p *Page) Close()

Close frees memory allocated when Poppler opened the page

func (*Page) Duration

func (p *Page) Duration() float64

func (*Page) Index

func (p *Page) Index() int

func (*Page) Label

func (p *Page) Label() string

func (*Page) Size

func (p *Page) Size() (width, height float64)

func (*Page) Text

func (p *Page) Text() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL