Documentation
¶
Overview ¶
Package reader provides functionality for reading and parsing existing PDF files.
It implements a PDF parser that can extract the object structure, page tree, and text content from PDF documents conforming to the PDF specification (ISO 32000).
Index ¶
- type Array
- type Boolean
- type Dict
- type Document
- func (d *Document) Catalog() (Dict, error)
- func (d *Document) FormField(name string) (*FormField, error)
- func (d *Document) FormFields() ([]*FormField, error)
- func (d *Document) Metadata() map[string]string
- func (d *Document) NumPages() int
- func (d *Document) Page(n int) (*Page, error)
- func (d *Document) Pages() iter.Seq2[int, *Page]
- func (d *Document) ResolveReference(ref Reference) (Object, error)
- type FormField
- type IndirectObject
- type Integer
- type Name
- type Null
- type Object
- type Page
- type Real
- type Rectangle
- type Reference
- type Stream
- type String
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Dict ¶
Dict represents a PDF dictionary mapping names to objects.
type Document ¶
type Document struct {
Version string // PDF version from file header (e.g., "1.7")
// contains filtered or unexported fields
}
Document represents a parsed PDF document.
func OpenWithPassword ¶
OpenWithPassword opens and parses an encrypted PDF file using the given password.
func ReadFrom ¶
ReadFrom parses a PDF document from a reader. The reader content is read entirely into memory for random access.
Example ¶
ExampleOpen demonstrates reading a PDF, inspecting its metadata, and iterating over its pages.
package main
import (
"bytes"
"fmt"
gofpdf "github.com/lvillar/gofpdf"
"github.com/lvillar/gofpdf/reader"
)
func main() {
// Build a small in-memory PDF so the example is self-contained.
pdf := gofpdf.New("P", "mm", "A4", "")
pdf.SetTitle("Quarterly Report", true)
pdf.SetAuthor("Acme Analytics", true)
pdf.SetFont("Helvetica", "B", 16)
pdf.AddPage()
pdf.Cell(0, 10, "Page 1: Summary")
pdf.AddPage()
pdf.Cell(0, 10, "Page 2: Details")
var buf bytes.Buffer
if err := pdf.Output(&buf); err != nil {
fmt.Println(err)
return
}
// Now read it back.
doc, err := reader.ReadFrom(&buf)
if err != nil {
fmt.Println(err)
return
}
meta := doc.Metadata()
fmt.Printf("Pages: %d\n", doc.NumPages())
fmt.Printf("Title: %s\n", meta["Title"])
fmt.Printf("Author: %s\n", meta["Author"])
}
Output: Pages: 2 Title: Quarterly Report Author: Acme Analytics
func ReadFromWithPassword ¶
ReadFromWithPassword parses an encrypted PDF from a reader using the given password.
func (*Document) FormField ¶
FormField returns the form field with the given fully qualified name. Returns nil if the field is not found.
func (*Document) FormFields ¶
FormFields returns all form fields found in the document's AcroForm. Returns an empty slice (not nil) if no AcroForm is present.
type FormField ¶
type FormField struct {
Name string // partial field name (/T)
FullName string // fully qualified dotted name
Type string // field type: "Tx", "Btn", "Ch", "Sig"
Value string // current value (/V)
Default string // default value (/DV)
Flags int // field flags (/Ff)
Rect Rectangle // widget annotation rectangle
Options []string // choice options (/Opt) for "Ch" fields
Kids []*FormField // child fields in hierarchy
ObjNum int // object number if from an indirect object
// contains filtered or unexported fields
}
FormField represents a form field parsed from a PDF's AcroForm dictionary.
func (*FormField) IsReadOnly ¶
IsReadOnly returns true if the field has the ReadOnly flag set (bit 1).
func (*FormField) IsRequired ¶
IsRequired returns true if the field has the Required flag set (bit 2).
type IndirectObject ¶
IndirectObject represents a PDF indirect object definition (e.g., "10 0 obj ... endobj").
func (IndirectObject) String ¶
func (o IndirectObject) String() string
type Object ¶
type Object interface {
String() string
// contains filtered or unexported methods
}
Object is the interface satisfied by all PDF object types. The unexported method prevents external types from implementing it.
type Page ¶
type Page struct {
Number int
MediaBox Rectangle
CropBox *Rectangle
Resources Dict
Contents []Stream
Rotate int
// contains filtered or unexported fields
}
Page represents a single page in a PDF document.
func (*Page) ContentStream ¶
ContentStream returns the decompressed content stream data for this page. If the page has multiple content streams, they are concatenated.
func (*Page) ExtractText ¶
ExtractText extracts the text content from this page. It parses the content stream and extracts text from BT/ET blocks using the Tj, TJ, ', and " operators.
Note: This is a basic extraction that handles common cases. Complex text with custom encodings, CIDFonts, or ToUnicode CMaps may not be fully supported.
type Rectangle ¶
type Rectangle struct {
LLX, LLY, URX, URY float64
}
Rectangle represents a PDF rectangle (typically [llx lly urx ury]).