reader

package
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 8, 2026 License: MIT Imports: 14 Imported by: 0

Documentation

Overview

Package reader provides functionality for reading and parsing existing PDF files.

It implements a PDF parser that can extract the object structure, page tree, and text content from PDF documents conforming to the PDF specification (ISO 32000).

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Array

type Array []Object

Array represents a PDF array of objects.

func (Array) String

func (a Array) String() string

type Boolean

type Boolean bool

Boolean represents a PDF boolean value.

func (Boolean) String

func (b Boolean) String() string

type Dict

type Dict map[Name]Object

Dict represents a PDF dictionary mapping names to objects.

func (Dict) GetArray

func (d Dict) GetArray(key Name) Array

GetArray returns an array entry, or nil if not found.

func (Dict) GetDict

func (d Dict) GetDict(key Name) Dict

GetDict returns a sub-dictionary, or nil if not found.

func (Dict) GetInt

func (d Dict) GetInt(key Name) (int64, bool)

GetInt returns the value of an integer entry, or 0 if not found.

func (Dict) GetName

func (d Dict) GetName(key Name) Name

GetName returns the value of a name entry, or empty string if not found.

func (Dict) GetString

func (d Dict) GetString(key Name) string

GetString returns the string value for a dictionary key, resolving references.

func (Dict) String

func (d Dict) String() string

type Document

type Document struct {
	Version string // PDF version from file header (e.g., "1.7")
	// contains filtered or unexported fields
}

Document represents a parsed PDF document.

func Open

func Open(filename string) (*Document, error)

Open opens and parses a PDF file from disk.

func OpenWithPassword

func OpenWithPassword(filename, password string) (*Document, error)

OpenWithPassword opens and parses an encrypted PDF file using the given password.

func ReadFrom

func ReadFrom(r io.Reader) (*Document, error)

ReadFrom parses a PDF document from a reader. The reader content is read entirely into memory for random access.

Example

ExampleOpen demonstrates reading a PDF, inspecting its metadata, and iterating over its pages.

package main

import (
	"bytes"
	"fmt"

	gofpdf "github.com/lvillar/gofpdf"
	"github.com/lvillar/gofpdf/reader"
)

func main() {
	// Build a small in-memory PDF so the example is self-contained.
	pdf := gofpdf.New("P", "mm", "A4", "")
	pdf.SetTitle("Quarterly Report", true)
	pdf.SetAuthor("Acme Analytics", true)
	pdf.SetFont("Helvetica", "B", 16)

	pdf.AddPage()
	pdf.Cell(0, 10, "Page 1: Summary")
	pdf.AddPage()
	pdf.Cell(0, 10, "Page 2: Details")

	var buf bytes.Buffer
	if err := pdf.Output(&buf); err != nil {
		fmt.Println(err)
		return
	}

	// Now read it back.
	doc, err := reader.ReadFrom(&buf)
	if err != nil {
		fmt.Println(err)
		return
	}

	meta := doc.Metadata()
	fmt.Printf("Pages: %d\n", doc.NumPages())
	fmt.Printf("Title: %s\n", meta["Title"])
	fmt.Printf("Author: %s\n", meta["Author"])

}
Output:
Pages: 2
Title: Quarterly Report
Author: Acme Analytics

func ReadFromWithPassword

func ReadFromWithPassword(r io.Reader, password string) (*Document, error)

ReadFromWithPassword parses an encrypted PDF from a reader using the given password.

func (*Document) Catalog

func (d *Document) Catalog() (Dict, error)

Catalog returns the document's catalog dictionary (the /Root object).

func (*Document) FormField

func (d *Document) FormField(name string) (*FormField, error)

FormField returns the form field with the given fully qualified name. Returns nil if the field is not found.

func (*Document) FormFields

func (d *Document) FormFields() ([]*FormField, error)

FormFields returns all form fields found in the document's AcroForm. Returns an empty slice (not nil) if no AcroForm is present.

func (*Document) Metadata

func (d *Document) Metadata() map[string]string

Metadata returns document metadata from the /Info dictionary.

func (*Document) NumPages

func (d *Document) NumPages() int

NumPages returns the total number of pages in the document.

func (*Document) Page

func (d *Document) Page(n int) (*Page, error)

Page returns the page at the given 1-based index.

func (*Document) Pages

func (d *Document) Pages() iter.Seq2[int, *Page]

Pages returns an iterator over all pages. Index is 1-based.

func (*Document) ResolveReference

func (d *Document) ResolveReference(ref Reference) (Object, error)

ResolveReference resolves an indirect reference to the actual object. This is the public API for resolving references.

type FormField

type FormField struct {
	Name     string       // partial field name (/T)
	FullName string       // fully qualified dotted name
	Type     string       // field type: "Tx", "Btn", "Ch", "Sig"
	Value    string       // current value (/V)
	Default  string       // default value (/DV)
	Flags    int          // field flags (/Ff)
	Rect     Rectangle    // widget annotation rectangle
	Options  []string     // choice options (/Opt) for "Ch" fields
	Kids     []*FormField // child fields in hierarchy
	ObjNum   int          // object number if from an indirect object
	// contains filtered or unexported fields
}

FormField represents a form field parsed from a PDF's AcroForm dictionary.

func (*FormField) IsReadOnly

func (f *FormField) IsReadOnly() bool

IsReadOnly returns true if the field has the ReadOnly flag set (bit 1).

func (*FormField) IsRequired

func (f *FormField) IsRequired() bool

IsRequired returns true if the field has the Required flag set (bit 2).

type IndirectObject

type IndirectObject struct {
	Reference
	Value Object
}

IndirectObject represents a PDF indirect object definition (e.g., "10 0 obj ... endobj").

func (IndirectObject) String

func (o IndirectObject) String() string

type Integer

type Integer int64

Integer represents a PDF integer value.

func (Integer) String

func (i Integer) String() string

type Name

type Name string

Name represents a PDF name object (e.g., /Type, /Pages).

func (Name) String

func (n Name) String() string

type Null

type Null struct{}

Null represents the PDF null object.

func (Null) String

func (Null) String() string

type Object

type Object interface {
	String() string
	// contains filtered or unexported methods
}

Object is the interface satisfied by all PDF object types. The unexported method prevents external types from implementing it.

type Page

type Page struct {
	Number    int
	MediaBox  Rectangle
	CropBox   *Rectangle
	Resources Dict
	Contents  []Stream
	Rotate    int
	// contains filtered or unexported fields
}

Page represents a single page in a PDF document.

func (*Page) ContentStream

func (p *Page) ContentStream() ([]byte, error)

ContentStream returns the decompressed content stream data for this page. If the page has multiple content streams, they are concatenated.

func (*Page) ExtractText

func (p *Page) ExtractText() (string, error)

ExtractText extracts the text content from this page. It parses the content stream and extracts text from BT/ET blocks using the Tj, TJ, ', and " operators.

Note: This is a basic extraction that handles common cases. Complex text with custom encodings, CIDFonts, or ToUnicode CMaps may not be fully supported.

type Real

type Real float64

Real represents a PDF real (floating-point) value.

func (Real) String

func (r Real) String() string

type Rectangle

type Rectangle struct {
	LLX, LLY, URX, URY float64
}

Rectangle represents a PDF rectangle (typically [llx lly urx ury]).

func (Rectangle) Height

func (r Rectangle) Height() float64

Height returns the height of the rectangle.

func (Rectangle) Width

func (r Rectangle) Width() float64

Width returns the width of the rectangle.

type Reference

type Reference struct {
	Number     int
	Generation int
}

Reference represents an indirect object reference (e.g., "10 0 R").

func (Reference) String

func (r Reference) String() string

type Stream

type Stream struct {
	Dict Dict
	Data []byte // raw data (may be compressed)
}

Stream represents a PDF stream object (dictionary + encoded data).

func (Stream) String

func (s Stream) String() string

type String

type String struct {
	Value []byte
	IsHex bool
}

String represents a PDF string (literal or hexadecimal).

func (String) String

func (s String) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL