pdfcpu

package module
v0.1.11 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 1, 2018 License: MIT Imports: 28 Imported by: 0

README

pdfcpu: a golang pdf processor

Build Status GoDoc Coverage Status Go Report Card

logo

Package pdfcpu is a simple PDF processing library written in Go supporting encryption. It provides both an API and a CLI. Supported are all versions up to PDF 1.7 (ISO-32000).

Motivation

Reducing the size of large PDF files for mass mailings by optimization to the bare minimum. This can be achieved by analyzing a PDF's cross reference table, removing redundant embedded resources like font files or images and by always writing back the file maxing out PDF compression. I also wanted to have my own swiss army knife for PDFs written entirely in Go that allows me to trim, split and merge PDF content.

Features

  • Validate (validates PDF files up to version 7.0)
  • Read (builds xref table from PDF file)
  • Write (writes xref table to PDF file)
  • Optimize (gets rid of redundancies like duplicate fonts, images)
  • Split (split a multi page PDF file into single page PDF files)
  • Merge (a set of PDF files into one consolidated PDF file)
  • Extract Images (extract all embedded images of a PDF file into a given dir)
  • Extract Fonts (extract all embedded fonts of a PDF file into a given dir)
  • Extract Pages (extract specific pages into a given dir)
  • Extract Content (extract the PDF-Source into given dir)
  • Trim (generate a custom version of a PDF file)
  • Manage (add,remove,list,extract) embedded file attachments
  • Encrypt (sets password protection)
  • Decrypt (removes password protection)
  • Change user/owner password
  • Manage (add,list) user access permissions

Demo Screencast

asciicast

Installation

Required build version: go1.8 and up

go get github.com/hhrutter/pdfcpu/cmd/...

Usage

pdfcpu validate [-verbose] [-mode strict|relaxed] [-upw userpw] [-opw ownerpw] inFile
pdfcpu optimize [-verbose] [-stats csvFile] [-upw userpw] [-opw ownerpw] inFile [outFile]
pdfcpu split [-verbose] [-upw userpw] [-opw ownerpw] inFile outDir
pdfcpu merge [-verbose] outFile inFile...
pdfcpu extract [-verbose] -mode image|font|content|page [-pages pageSelection] [-upw userpw] [-opw ownerpw] inFile outDir
pdfcpu trim [-verbose] -pages pageSelection [-upw userpw] [-opw ownerpw] inFile outFile

pdfcpu attach list [-verbose] [-upw userpw] [-opw ownerpw] inFile
pdfcpu attach add [-verbose] [-upw userpw] [-opw ownerpw] inFile file...
pdfcpu attach remove [-verbose] [-upw userpw] [-opw ownerpw] inFile [file...]
pdfcpu attach extract [-verbose] [-upw userpw] [-opw ownerpw] inFile outDir [file...]

pdfcpu encrypt [-verbose] [-mode rc4|aes] [-key 40|128] [-perm none|all] [-upw userpw] [-opw ownerpw] inFile [outFile]
pdfcpu decrypt [-verbose] [-upw userpw] [-opw ownerpw] inFile [outFile]
pdfcpu changeupw [-verbose] [-opw ownerpw] inFile upwOld upwNew
pdfcpu changeopw [-verbose] [-upw userpw] inFile opwOld opwNew

pdfcpu perm list [-verbose] [-upw userpw] [-opw ownerpw] inFile
pdfcpu perm add [-verbose] [-perm none|all] [-upw userpw] -opw ownerpw inFile

pdfcpu version

Please read the documentation

Status

Version: 0.1.11

  • Reorganized package structure

  • Filter implementation for LZWDecode & extended filter_test.go.

Contributing

  • Please open an issue if you find a bug or want to propose a change.
  • Pull requests, bug fixes and issues are always welcome.

Disclaimer

Usage of pdfcpu assumes you know about and respect all copyrights of any PDF content you may be processing. This applies to the PDF files as such, their content and in particular all embedded resources like font files or images. Credit goes to Renee French for creating our beloved Gopher.

License

MIT

Documentation

Overview

Package pdfcpu is a simple PDF processing library written in Go supporting encryption. It provides an API and a command line interface. Supported are all versions up to PDF 1.7 (ISO-32000).

The available commands are:

validate	validate PDF against PDF 32000-1:2008 (PDF 1.7)
optimize	optimize PDF by getting rid of redundant page resources
split		split multi-page PDF into several single-page PDFs
merge		concatenate 2 or more PDFs
extract		extract images, fonts, content or pages
trim		create trimmed version
attach		list, add, remove, extract embedded file attachments
perm		list, add user access permissions
encrypt		set password protection
decrypt		remove password protection
changeupw	change user password
changeopw	change owner password
version		print version

Index

Examples

Constants

View Source
const (

	// ValidationStrict ensures 100% compliance with the spec (PDF 32000-1:2008).
	ValidationStrict = 0

	// ValidationRelaxed ensures PDF compliance based on frequently encountered validation errors.
	ValidationRelaxed = 1

	// StatsFileNameDefault is the standard stats filename.
	StatsFileNameDefault = "stats.csv"

	// PermissionsAll enables all user access permission bits.
	PermissionsAll int16 = -1 // 0xFFFF

	// PermissionsNone disables all user access permissions bits.
	PermissionsNone int16 = -3901 // 0xF0C3

)
View Source
const (
	RootVersion = iota
	RootExtensions
	RootPageLabels
	RootNames
	RootDests
	RootViewerPrefs
	RootPageLayout
	RootPageMode
	RootOutlines
	RootThreads
	RootOpenAction
	RootAA
	RootURI
	RootAcroForm
	RootMetadata
	RootStructTreeRoot
	RootMarkInfo
	RootLang
	RootSpiderInfo
	RootOutputIntents
	RootPieceInfo
	RootOCProperties
	RootPerms
	RootLegal
	RootRequirements
	RootCollection
	RootNeedsRendering
)

The PDF root object fields.

View Source
const (
	PageLastModified = iota
	PageResources
	PageMediaBox
	PageCropBox
	PageBleedBox
	PageTrimBox
	PageArtBox
	PageBoxColorInfo
	PageContents
	PageRotate
	PageGroup
	PageThumb
	PageB
	PageDur
	PageTrans
	PageAnnots
	PageAA
	PageMetadata
	PagePieceInfo
	PageStructParents
	PageID
	PagePZ
	PageSeparationInfo
	PageTabs
	PageTemplateInstantiated
	PagePresSteps
	PageUserUnit
	PageVP
)

The PDF page object fields.

View Source
const (
	EolLF   = "\x0A"
	EolCR   = "\x0D"
	EolCRLF = "\x0D\x0A"
)

Supported line delimiters

View Source
const (

	// REQUIRED is used for required dict entries.
	REQUIRED = true

	// OPTIONAL is used for optional dict entries.
	OPTIONAL = false
)
View Source
const (

	// ExcludePatternCS ...
	ExcludePatternCS = true

	// IncludePatternCS ...
	IncludePatternCS = false
)
View Source
const (
	// PDFCPUVersion returns the current pdfcpu version.
	PDFCPUVersion = "0.1.11"

	// PDFCPULongVersion returns pdfcpu's signature.
	PDFCPULongVersion = "golang pdfcpu v" + PDFCPUVersion
)
View Source
const FreeHeadGeneration = 65535

FreeHeadGeneration is the predefined generation number for the head of the free list.

View Source
const (

	// ObjectStreamMaxObjects limits the number of objects within an object stream written.
	ObjectStreamMaxObjects = 100
)

Variables

This section is empty.

Functions

func AddAttachments added in v0.1.3

func AddAttachments(fileIn string, files []string, config *Configuration) error

AddAttachments embeds files into a PDF.

func AddPermissions added in v0.1.6

func AddPermissions(fileIn string, config *Configuration) error

AddPermissions sets the user access permissions.

func ChangeOwnerPassword added in v0.1.1

func ChangeOwnerPassword(fileIn, fileOut string, config *Configuration, pwOld, pwNew *string) error

ChangeOwnerPassword of fileIn and write result to fileOut.

func ChangeUserPassword added in v0.1.1

func ChangeUserPassword(fileIn, fileOut string, config *Configuration, pwOld, pwNew *string) error

ChangeUserPassword of fileIn and write result to fileOut.

func Date added in v0.1.11

func Date(s string) bool

Date validates an ISO/IEC 8824 compliant date string.

func DecodeUTF16String added in v0.1.11

func DecodeUTF16String(s string) (string, error)

DecodeUTF16String decodes a UTF16BE string from a hex string.

func Decrypt added in v0.1.1

func Decrypt(fileIn, fileOut string, config *Configuration) error

Decrypt fileIn and write result to fileOut.

func Encrypt added in v0.1.1

func Encrypt(fileIn, fileOut string, config *Configuration) error

Encrypt fileIn and write result to fileOut.

func Escape added in v0.1.11

func Escape(s string) (*string, error)

Escape applies all defined escape sequences to s.

func ExtractAttachments added in v0.1.3

func ExtractAttachments(fileIn, dirOut string, files []string, config *Configuration) error

ExtractAttachments extracts embedded files from a PDF.

func ExtractContent

func ExtractContent(fileIn, dirOut string, pageSelection []string, config *Configuration) error

ExtractContent dumps "PDF source" files from fileIn into dirOut for selected pages.

func ExtractFonts

func ExtractFonts(fileIn, dirOut string, pageSelection []string, config *Configuration) error

ExtractFonts dumps embedded fontfiles from fileIn into dirOut for selected pages.

func ExtractImages

func ExtractImages(fileIn, dirOut string, pageSelection []string, config *Configuration) error

ExtractImages dumps embedded image resources from fileIn into dirOut for selected pages.

func ExtractPages

func ExtractPages(fileIn, dirOut string, pageSelection []string, config *Configuration) error

ExtractPages generates single page PDF files from fileIn in dirOut for selected pages.

func HexLiteralToString added in v0.1.11

func HexLiteralToString(hexString string) (string, error)

HexLiteralToString returns a possibly UTF16 encoded string for a hex string.

func IsStringUTF16BE added in v0.1.11

func IsStringUTF16BE(s string) bool

IsStringUTF16BE checks a string for Big Endian byte order BOM.

func IsUTF16BE added in v0.1.11

func IsUTF16BE(b []byte) (ok bool, err error)

IsUTF16BE checks for Big Endian byte order mark.

func ListAttachments added in v0.1.3

func ListAttachments(fileIn string, config *Configuration) ([]string, error)

ListAttachments returns a list of embedded file attachments.

func ListPermissions added in v0.1.6

func ListPermissions(fileIn string, config *Configuration) ([]string, error)

ListPermissions returns a list of user access permissions.

func Merge

func Merge(filesIn []string, fileOut string, config *Configuration) error

Merge some PDF files together and write the result to fileOut. This corresponds to concatenating these files in the order specified by filesIn. The first entry of filesIn serves as the destination xRefTable where all the remaining files gets merged into.

func Optimize

func Optimize(fileIn, fileOut string, config *Configuration) error

Optimize reads in fileIn, does validation, optimization and writes the result to fileOut.

func ParsePageSelection

func ParsePageSelection(s string) ([]string, error)

ParsePageSelection ensures a correct page selection expression.

func Process

func Process(cmd *Command) (out []string, err error)

Process executes a pdfcpu command.

Example (AddAttachments)
config := NewDefaultConfiguration()

// Set optional password(s).
//config.UserPW = "upw"
//config.OwnerPW = "opw"

_, err := Process(AddAttachmentsCommand("in.pdf", []string{"a.csv", "b.jpg", "c.pdf"}, config))
if err != nil {
	return
}
Output:

Example (AddPermissions)
config := NewDefaultConfiguration()
config.UserPW = "upw"
config.OwnerPW = "opw"

config.UserAccessPermissions = PermissionsAll

_, err := Process(AddPermissionsCommand("in.pdf", config))
if err != nil {
	return
}
Output:

Example (ChangeOwnerPW)
config := NewDefaultConfiguration()

// supply existing user pw like so
config.UserPW = "upw"

// old and new owner pw
pwOld := "pwOld"
pwNew := "pwNew"

_, err := Process(ChangeOwnerPWCommand("in.pdf", "out.pdf", config, &pwOld, &pwNew))
if err != nil {
	return
}
Output:

Example (ChangeUserPW)
config := NewDefaultConfiguration()

// supply existing owner pw like so
config.OwnerPW = "opw"

pwOld := "pwOld"
pwNew := "pwNew"

_, err := Process(ChangeUserPWCommand("in.pdf", "out.pdf", config, &pwOld, &pwNew))
if err != nil {
	return
}
Output:

Example (Decrypt)
config := NewDefaultConfiguration()

config.UserPW = "upw"
config.OwnerPW = "opw"

_, err := Process(DecryptCommand("in.pdf", "out.pdf", config))
if err != nil {
	return
}
Output:

Example (Encrypt)
config := NewDefaultConfiguration()

config.UserPW = "upw"
config.OwnerPW = "opw"

_, err := Process(EncryptCommand("in.pdf", "out.pdf", config))
if err != nil {
	return
}
Output:

Example (ExtractAttachments)
config := NewDefaultConfiguration()

// Set optional password(s).
//config.UserPW = "upw"
//config.OwnerPW = "opw"

// Extract all attachments.
_, err := Process(ExtractAttachmentsCommand("in.pdf", "dirOut", nil, config))
if err != nil {
	return
}

// Extract specific attachments.
_, err = Process(ExtractAttachmentsCommand("in.pdf", "dirOut", []string{"a.csv", "b.pdf"}, config))
if err != nil {
	return
}
Output:

Example (ExtractImages)
// Extract all embedded images for first 5 and last 5 pages but not for page 4.
selectedPages := []string{"-5", "5-", "!4"}

config := NewDefaultConfiguration()

// Set optional password(s).
//config.UserPW = "upw"
//config.OwnerPW = "opw"

_, err := Process(ExtractImagesCommand("in.pdf", "dirOut", selectedPages, config))
if err != nil {
	return
}
Output:

Example (ExtractPages)
// Extract single-page PDFs for pages 3, 4 and 5.
selectedPages := []string{"3..5"}

config := NewDefaultConfiguration()

// Set optional password(s).
//config.UserPW = "upw"
//config.OwnerPW = "opw"

_, err := Process(ExtractPagesCommand("in.pdf", "dirOut", selectedPages, config))
if err != nil {
	return
}
Output:

Example (ListAttachments)
config := NewDefaultConfiguration()

// Set optional password(s).
//config.UserPW = "upw"
//config.OwnerPW = opw"

list, err := Process(ListAttachmentsCommand("in.pdf", config))
if err != nil {
	return
}

// Print attachment list.
for _, l := range list {
	fmt.Println(l)
}
Output:

Example (ListPermissions)
config := NewDefaultConfiguration()
config.UserPW = "upw"
config.OwnerPW = "opw"

list, err := Process(ListPermissionsCommand("in.pdf", config))
if err != nil {
	return
}

// Print permissions list.
for _, l := range list {
	fmt.Println(l)
}
Output:

Example (Merge)
// Concatenate this sequence of PDF files:
filenamesIn := []string{"in1.pdf", "in2.pdf", "in3.pdf"}

_, err := Process(MergeCommand(filenamesIn, "out.pdf", NewDefaultConfiguration()))
if err != nil {
	return
}
Output:

Example (Optimize)
config := NewDefaultConfiguration()

// Set optional password(s).
//config.UserPW = "upw"
//config.OwnerPW = "opw"

// Generate optional stats.
config.StatsFileName = "stats.csv"

// Configure end of line sequence for writing.
config.Eol = EolLF

_, err := Process(OptimizeCommand("in.pdf", "out.pdf", config))
if err != nil {
	return
}
Output:

Example (RemoveAttachments)
config := NewDefaultConfiguration()

// Set optional password(s).
//config.UserPW = "upw"
//config.OwnerPW = "opw"

// Not to be confused with the ExtractAttachmentsCommand!

// Remove all attachments.
_, err := Process(RemoveAttachmentsCommand("in.pdf", nil, config))
if err != nil {
	return
}

// Remove specific attachments.
_, err = Process(RemoveAttachmentsCommand("in.pdf", []string{"a.csv", "b.jpg"}, config))
if err != nil {
	return
}
Output:

Example (Split)
config := NewDefaultConfiguration()

// Set optional password(s).
//config.UserPW = "upw"
//config.OwnerPW = "opw"

// Split into single-page PDFs.

_, err := Process(SplitCommand("in.pdf", "outDir", config))
if err != nil {
	return
}
Output:

Example (Trim)
// Trim to first three pages.
selectedPages := []string{"-3"}

config := NewDefaultConfiguration()

// Set optional password(s).
//config.UserPW = "upw"
//config.OwnerPW = "opw"

_, err := Process(TrimCommand("in.pdf", "out.pdf", selectedPages, config))
if err != nil {
	return
}
Output:

Example (Validate)
config := NewDefaultConfiguration()

// Set optional password(s).
//config.UserPW = "upw"
//config.OwnerPW = "opw"

// Set relaxed validation mode.
config.ValidationMode = ValidationRelaxed

_, err := Process(ValidateCommand("in.pdf", config))
if err != nil {
	return
}
Output:

func RemoveAttachments added in v0.1.3

func RemoveAttachments(fileIn string, files []string, config *Configuration) error

RemoveAttachments deletes embedded files from a PDF.

func Split

func Split(fileIn, dirOut string, config *Configuration) error

Split generates a sequence of single page PDF files in dirOut creating one file for every page of inFile.

func StringLiteralToString added in v0.1.11

func StringLiteralToString(s string) (string, error)

StringLiteralToString returns the best possible string rep for a string literal.

func Trim

func Trim(fileIn, fileOut string, pageSelection []string, config *Configuration) error

Trim generates a trimmed version of fileIn containing all pages selected.

func Unescape added in v0.1.11

func Unescape(s string) ([]byte, error)

Unescape resolves all escape sequences of s.

func Validate

func Validate(fileIn string, config *Configuration) error

Validate validates a PDF file against ISO-32000-1:2008.

func VersionString added in v0.1.11

func VersionString(version PDFVersion) string

VersionString returns a string representation for a given PDFVersion.

func Write

func Write(ctx *PDFContext) error

Write generates a PDF file for a given PDFContext.

Types

type ByteSize added in v0.1.11

type ByteSize float64

ByteSize represents the various terms for storage space.

const (
	KB ByteSize = 1 << (10 * iota)
	MB
	GB
)

Storage space terms.

func (ByteSize) String added in v0.1.11

func (b ByteSize) String() string

type Command

type Command struct {
	Mode          CommandMode    // VALIDATE  OPTIMIZE  SPLIT  MERGE  EXTRACT  TRIM  LISTATT ADDATT REMATT EXTATT  ENCRYPT  DECRYPT  CHANGEUPW  CHANGEOPW LISTP ADDP
	InFile        *string        //    *         *        *      -       *      *      *       *       *      *       *        *         *          *       *     *
	InFiles       []string       //    -         -        -      *       -      -      -       *       *      *       -        -         -          -       -     -
	InDir         *string        //    -         -        -      -       -      -      -       -       -      -       -        -         -          -       -     -
	OutFile       *string        //    -         *        -      *       -      *      -       -       -      -       *        *         *          *       -     -
	OutDir        *string        //    -         -        *      -       *      -      -       -       -      *       -        -         -          -       -     -
	PageSelection []string       //    -         -        -      -       *      *      -       -       -      -       -        -         -          -       -     -
	Config        *Configuration //    *         *        *      *       *      *      *       *       *      *       *        *         *          *       *     *
	PWOld         *string        //    -         -        -      -       -      -      -       -       -      -       -        -         *          *       -     -
	PWNew         *string        //    -         -        -      -       -      -      -       -       -      -       -        -         *          *       -     -
}

Command represents an execution context.

func AddAttachmentsCommand added in v0.1.3

func AddAttachmentsCommand(pdfFileNameIn string, fileNamesIn []string, config *Configuration) *Command

AddAttachmentsCommand creates a new AddAttachmentsCommand.

func AddPermissionsCommand added in v0.1.6

func AddPermissionsCommand(pdfFileNameIn string, config *Configuration) *Command

AddPermissionsCommand creates a new AddPermissionsCommand.

func ChangeOwnerPWCommand added in v0.1.1

func ChangeOwnerPWCommand(pdfFileNameIn, pdfFileNameOut string, config *Configuration, pwOld, pwNew *string) *Command

ChangeOwnerPWCommand creates a new ChangeOwnerPWCommand.

func ChangeUserPWCommand added in v0.1.1

func ChangeUserPWCommand(pdfFileNameIn, pdfFileNameOut string, config *Configuration, pwOld, pwNew *string) *Command

ChangeUserPWCommand creates a new ChangeUserPWCommand.

func DecryptCommand added in v0.1.1

func DecryptCommand(pdfFileNameIn, pdfFileNameOut string, config *Configuration) *Command

DecryptCommand creates a new DecryptCommand.

func EncryptCommand added in v0.1.1

func EncryptCommand(pdfFileNameIn, pdfFileNameOut string, config *Configuration) *Command

EncryptCommand creates a new EncryptCommand.

func ExtractAttachmentsCommand added in v0.1.3

func ExtractAttachmentsCommand(pdfFileNameIn, dirNameOut string, fileNamesIn []string, config *Configuration) *Command

ExtractAttachmentsCommand creates a new ExtractAttachmentsCommand.

func ExtractContentCommand

func ExtractContentCommand(pdfFileNameIn, dirNameOut string, pageSelection []string, config *Configuration) *Command

ExtractContentCommand creates a new ExtractContentCommand.

func ExtractFontsCommand

func ExtractFontsCommand(pdfFileNameIn, dirNameOut string, pageSelection []string, config *Configuration) *Command

ExtractFontsCommand creates a new ExtractFontsCommand. (experimental)

func ExtractImagesCommand

func ExtractImagesCommand(pdfFileNameIn, dirNameOut string, pageSelection []string, config *Configuration) *Command

ExtractImagesCommand creates a new ExtractImagesCommand. (experimental)

func ExtractPagesCommand

func ExtractPagesCommand(pdfFileNameIn, dirNameOut string, pageSelection []string, config *Configuration) *Command

ExtractPagesCommand creates a new ExtractPagesCommand.

func ListAttachmentsCommand added in v0.1.3

func ListAttachmentsCommand(pdfFileNameIn string, config *Configuration) *Command

ListAttachmentsCommand create a new ListAttachmentsCommand.

func ListPermissionsCommand added in v0.1.6

func ListPermissionsCommand(pdfFileNameIn string, config *Configuration) *Command

ListPermissionsCommand create a new ListPermissionsCommand.

func MergeCommand

func MergeCommand(pdfFileNamesIn []string, pdfFileNameOut string, config *Configuration) *Command

MergeCommand creates a new MergeCommand.

func OptimizeCommand

func OptimizeCommand(pdfFileNameIn, pdfFileNameOut string, config *Configuration) *Command

OptimizeCommand creates a new OptimizeCommand.

func RemoveAttachmentsCommand added in v0.1.3

func RemoveAttachmentsCommand(pdfFileNameIn string, fileNamesIn []string, config *Configuration) *Command

RemoveAttachmentsCommand creates a new RemoveAttachmentsCommand.

func SplitCommand

func SplitCommand(pdfFileNameIn, dirNameOut string, config *Configuration) *Command

SplitCommand creates a new SplitCommand.

func TrimCommand

func TrimCommand(pdfFileNameIn, pdfFileNameOut string, pageSelection []string, config *Configuration) *Command

TrimCommand creates a new TrimCommand.

func ValidateCommand

func ValidateCommand(pdfFileName string, config *Configuration) *Command

ValidateCommand creates a new ValidateCommand.

type CommandMode added in v0.1.11

type CommandMode int

CommandMode specifies the operation being executed.

const (
	VALIDATE CommandMode = iota
	OPTIMIZE
	SPLIT
	MERGE
	EXTRACTIMAGES
	EXTRACTFONTS
	EXTRACTPAGES
	EXTRACTCONTENT
	TRIM
	ADDATTACHMENTS
	REMOVEATTACHMENTS
	EXTRACTATTACHMENTS
	LISTATTACHMENTS
	ADDPERMISSIONS
	LISTPERMISSIONS
	ENCRYPT
	DECRYPT
	CHANGEUPW
	CHANGEOPW
)

The available commands.

type Configuration added in v0.1.11

type Configuration struct {

	// Enables PDF V1.5 compatible processing of object streams, xref streams, hybrid PDF files.
	Reader15 bool

	// Enables decoding of all streams (fontfiles, images..) for logging purposes.
	DecodeAllStreams bool

	// Validate against ISO-32000: strict or relaxed
	ValidationMode int

	// End of line char sequence for writing.
	Eol string

	// Turns on object stream generation.
	// A signal for compressing any new non-stream-object into an object stream.
	// true enforces WriteXRefStream to true.
	// false does not prevent xRefStream generation.
	WriteObjectStream bool

	// Switches between xRefSection (<=V1.4) and objectStream/xRefStream (>=V1.5) writing.
	WriteXRefStream bool

	// Turns on stats collection.
	CollectStats bool

	// A CSV-filename holding the statistics.
	StatsFileName string

	// Supplied user password
	UserPW    string
	UserPWNew *string

	// Supplied owner password
	OwnerPW    string
	OwnerPWNew *string

	// EncryptUsingAES ensures AES encryption.
	// true: AES encryption
	// false: RC4 encryption.
	EncryptUsingAES bool

	// EncryptUsing128BitKey ensures 128 bit key length.
	// true: use 128 bit key
	// false: use 40 bit key
	EncryptUsing128BitKey bool

	// Supplied user access permissions, see Table 22
	UserAccessPermissions int16

	// Command being executed.
	Mode CommandMode
}

Configuration of a PDFContext.

func NewDefaultConfiguration added in v0.1.11

func NewDefaultConfiguration() *Configuration

NewDefaultConfiguration returns the default pdfcpu configuration.

func (*Configuration) ValidationModeString added in v0.1.11

func (c *Configuration) ValidationModeString() string

ValidationModeString returns a string rep for the validation mode in effect.

type Enc added in v0.1.11

type Enc struct {
	O, U       []byte
	L, P, R, V int
	Emd        bool // encrypt meta data
	ID         []byte
}

Enc wraps around all defined encryption attributes.

type FontObject added in v0.1.11

type FontObject struct {
	ResourceNames []string
	Prefix        string
	FontName      string
	FontDict      *PDFDict
	Data          []byte
	Extension     string
}

FontObject represents a font used in a PDF file.

func (*FontObject) AddResourceName added in v0.1.11

func (fo *FontObject) AddResourceName(resourceName string)

AddResourceName adds a resourceName referring to this font.

func (FontObject) Embedded added in v0.1.11

func (fo FontObject) Embedded() (embedded bool)

Embedded returns true if the font is embedded into this PDF file.

func (FontObject) Encoding added in v0.1.11

func (fo FontObject) Encoding() string

Encoding returns the Encoding of this font.

func (FontObject) ResourceNamesString added in v0.1.11

func (fo FontObject) ResourceNamesString() string

ResourceNamesString returns a string representation of all the resource names of this font.

func (FontObject) String added in v0.1.11

func (fo FontObject) String() string

func (FontObject) SubType added in v0.1.11

func (fo FontObject) SubType() string

SubType returns the SubType of this font.

type ImageObject added in v0.1.11

type ImageObject struct {
	ResourceNames []string
	ImageDict     *PDFStreamDict
	Extension     string
}

ImageObject represents an image used in a PDF file.

func (*ImageObject) AddResourceName added in v0.1.11

func (io *ImageObject) AddResourceName(resourceName string)

AddResourceName adds a resourceName to this imageObject's ResourceNames dict.

func (ImageObject) Data added in v0.1.11

func (io ImageObject) Data() []byte

Data returns the raw data belonging to this image object.

func (ImageObject) ResourceNamesString added in v0.1.11

func (io ImageObject) ResourceNamesString() string

ResourceNamesString returns a string representation of the ResourceNames for this image.

type IntSet added in v0.1.11

type IntSet map[int]bool

IntSet is a set of integers.

type Node added in v0.1.11

type Node struct {
	Kids       []*Node         // Mirror of the name tree's Kids array.
	Names      []entry         // Mirror of the name tree's Names array.
	Kmin, Kmax string          // Mirror of the name tree's Limit array[Kmin,Kmax].
	IndRef     *PDFIndirectRef // Pointer to the PDF object representing this name tree node.
}

Node is an opiniated implementation of the PDF name tree. pdfcpu caches all name trees found in the PDF catalog with this data structure. The PDF spec does not impose any rules regarding a strategy for the creation of nodes. A binary tree was chosen where each leaf node has a limited number of entries (maxEntries). Once maxEntries has been reached a leaf node turns into an intermediary node with two kids, which are leaf nodes each of them holding half of the sorted entries of the original leaf node.

func (*Node) Add added in v0.1.11

func (n *Node) Add(xRefTable *XRefTable, k string, v PDFObject) error

Add adds an entry to a name tree.

func (*Node) AddToLeaf added in v0.1.11

func (n *Node) AddToLeaf(k string, v PDFObject)

AddToLeaf adds an entry to a leaf.

func (Node) KeyList added in v0.1.11

func (n Node) KeyList() ([]string, error)

KeyList returns a sorted list of all keys.

func (Node) Process added in v0.1.11

func (n Node) Process(xRefTable *XRefTable, handler func(*XRefTable, string, PDFObject) error) error

Process traverses the nametree applying a handler to each entry (key-value pair).

func (*Node) Remove added in v0.1.11

func (n *Node) Remove(xRefTable *XRefTable, k string) (empty, ok bool, err error)

Remove removes an entry from a name tree. empty returns true if this node is an empty leaf node after removal. ok returns true if removal was successful.

func (Node) String added in v0.1.11

func (n Node) String() string

func (Node) Value added in v0.1.11

func (n Node) Value(k string) (PDFObject, bool)

Value returns the value given key

type OptimizationContext added in v0.1.11

type OptimizationContext struct {

	// Font section
	PageFonts         []IntSet
	FontObjects       map[int]*FontObject
	Fonts             map[string][]int
	DuplicateFontObjs IntSet
	DuplicateFonts    map[int]*PDFDict

	// Image section
	PageImages         []IntSet
	ImageObjects       map[int]*ImageObject
	DuplicateImageObjs IntSet
	DuplicateImages    map[int]*PDFStreamDict

	DuplicateInfoObjects IntSet // Possible result of manual info dict modification.

	NonReferencedObjs []int // Objects that are not referenced.
}

OptimizationContext represents the context for the optimiziation of a PDF file.

func (*OptimizationContext) DuplicateFontObjectsString added in v0.1.11

func (oc *OptimizationContext) DuplicateFontObjectsString() (int, string)

DuplicateFontObjectsString returns a formatted string and the number of objs.

func (*OptimizationContext) DuplicateImageObjectsString added in v0.1.11

func (oc *OptimizationContext) DuplicateImageObjectsString() (int, string)

DuplicateImageObjectsString returns a formatted string and the number of objs.

func (*OptimizationContext) DuplicateInfoObjectsString added in v0.1.11

func (oc *OptimizationContext) DuplicateInfoObjectsString() (int, string)

DuplicateInfoObjectsString returns a formatted string and the number of objs.

func (*OptimizationContext) IsDuplicateFontObject added in v0.1.11

func (oc *OptimizationContext) IsDuplicateFontObject(i int) bool

IsDuplicateFontObject returns true if object #i is a duplicate font object.

func (*OptimizationContext) IsDuplicateImageObject added in v0.1.11

func (oc *OptimizationContext) IsDuplicateImageObject(i int) bool

IsDuplicateImageObject returns true if object #i is a duplicate image object.

func (*OptimizationContext) IsDuplicateInfoObject added in v0.1.11

func (oc *OptimizationContext) IsDuplicateInfoObject(i int) bool

IsDuplicateInfoObject returns true if object #i is a duplicate info object.

func (*OptimizationContext) NonReferencedObjsString added in v0.1.11

func (oc *OptimizationContext) NonReferencedObjsString() (int, string)

NonReferencedObjsString returns a formatted string and the number of objs.

type PDFArray added in v0.1.11

type PDFArray []PDFObject

PDFArray represents a PDF array object.

func NewIntegerArray added in v0.1.11

func NewIntegerArray(fVars ...int) PDFArray

NewIntegerArray returns a PDFArray with PDFInteger entries.

func NewNameArray added in v0.1.11

func NewNameArray(sVars ...string) PDFArray

NewNameArray returns a PDFArray with PDFName entries.

func NewNumberArray added in v0.1.11

func NewNumberArray(fVars ...float64) PDFArray

NewNumberArray returns a PDFArray with PDFFloat entries.

func NewRectangle added in v0.1.11

func NewRectangle(llx, lly, urx, ury float64) PDFArray

NewRectangle creates a rectangle array

func NewStringArray added in v0.1.11

func NewStringArray(sVars ...string) PDFArray

NewStringArray returns a PDFArray with PDFStringLiteral entries.

func (PDFArray) PDFString added in v0.1.11

func (array PDFArray) PDFString() string

PDFString returns a string representation as found in and written to a PDF file.

func (PDFArray) String added in v0.1.11

func (array PDFArray) String() string

type PDFBoolean added in v0.1.11

type PDFBoolean bool

PDFBoolean represents a PDF boolean object.

func (PDFBoolean) PDFString added in v0.1.11

func (boolean PDFBoolean) PDFString() string

PDFString returns a string representation as found in and written to a PDF file.

func (PDFBoolean) String added in v0.1.11

func (boolean PDFBoolean) String() string

func (PDFBoolean) Value added in v0.1.11

func (boolean PDFBoolean) Value() bool

Value returns a bool value for this PDF object.

type PDFContext added in v0.1.11

type PDFContext struct {
	*Configuration
	*XRefTable
	Read     *ReadContext
	Optimize *OptimizationContext
	Write    *WriteContext
}

PDFContext represents the context for processing PDF files.

func NewPDFContext added in v0.1.11

func NewPDFContext(fileName string, file *os.File, config *Configuration) (*PDFContext, error)

NewPDFContext initializes a new PDFContext.

func Read

func Read(fileIn string, config *Configuration) (*PDFContext, error)

Read reads in a PDF file and builds an internal structure holding its cross reference table aka the PDFContext.

func (*PDFContext) ResetWriteContext added in v0.1.11

func (ctx *PDFContext) ResetWriteContext()

ResetWriteContext prepares an existing WriteContext for a new file to be written.

func (*PDFContext) String added in v0.1.11

func (ctx *PDFContext) String() string

type PDFDict added in v0.1.11

type PDFDict struct {
	Dict map[string]PDFObject
}

PDFDict represents a PDF dict object.

func NewPDFDict added in v0.1.11

func NewPDFDict() PDFDict

NewPDFDict returns a new PDFDict object.

func (PDFDict) BooleanEntry added in v0.1.11

func (d PDFDict) BooleanEntry(key string) *bool

BooleanEntry expects and returns a BooleanEntry for given key.

func (*PDFDict) Delete added in v0.1.11

func (d *PDFDict) Delete(key string) (value PDFObject)

Delete deletes the PDFObject for given key.

func (*PDFDict) Entry added in v0.1.11

func (d *PDFDict) Entry(dictName, key string, required bool) (PDFObject, error)

Entry returns the value for given key.

func (PDFDict) Find added in v0.1.11

func (d PDFDict) Find(key string) (value PDFObject, found bool)

Find returns the PDFObject for given key and PDFDict.

func (PDFDict) First added in v0.1.11

func (d PDFDict) First() *int

First returns a *int for key "First".

func (PDFDict) Index added in v0.1.11

func (d PDFDict) Index() *PDFArray

Index returns a *PDFArray for key "Index".

func (PDFDict) IndirectRefEntry added in v0.1.11

func (d PDFDict) IndirectRefEntry(key string) *PDFIndirectRef

IndirectRefEntry returns an indirectRefEntry for given key for this dictionary.

func (*PDFDict) Insert added in v0.1.11

func (d *PDFDict) Insert(key string, value PDFObject) (ok bool)

Insert adds a new entry to this PDFDict.

func (*PDFDict) InsertFloat added in v0.1.11

func (d *PDFDict) InsertFloat(key string, value float32)

InsertFloat adds a new float entry to this PDFDict.

func (*PDFDict) InsertInt added in v0.1.11

func (d *PDFDict) InsertInt(key string, value int)

InsertInt adds a new int entry to this PDFDict.

func (*PDFDict) InsertName added in v0.1.11

func (d *PDFDict) InsertName(key, value string)

InsertName adds a new name entry to this PDFDict.

func (*PDFDict) InsertString added in v0.1.11

func (d *PDFDict) InsertString(key, value string)

InsertString adds a new string entry to this PDFDict.

func (PDFDict) Int64Entry added in v0.1.11

func (d PDFDict) Int64Entry(key string) *int64

Int64Entry expects and returns a PDFInteger entry representing an int64 value for given key.

func (PDFDict) IntEntry added in v0.1.11

func (d PDFDict) IntEntry(key string) *int

IntEntry expects and returns a PDFInteger entry for given key.

func (PDFDict) IsLinearizationParmDict added in v0.1.11

func (d PDFDict) IsLinearizationParmDict() bool

IsLinearizationParmDict returns true if this dict has an int entry for key "Linearized".

func (PDFDict) IsObjStm added in v0.1.11

func (d PDFDict) IsObjStm() bool

IsObjStm returns true if given PDFDict is an object stream.

func (*PDFDict) Len added in v0.1.11

func (d *PDFDict) Len() int

Len returns the length of this PDFDict.

func (PDFDict) Length added in v0.1.11

func (d PDFDict) Length() (*int64, *int)

Length returns a *int64 for entry with key "Length". Stream length may be referring to an indirect object.

func (PDFDict) N added in v0.1.11

func (d PDFDict) N() *int

N returns a *int for key "N".

func (PDFDict) NameEntry added in v0.1.11

func (d PDFDict) NameEntry(key string) *string

NameEntry expects and returns a PDFName entry for given key.

func (PDFDict) PDFArrayEntry added in v0.1.11

func (d PDFDict) PDFArrayEntry(key string) *PDFArray

PDFArrayEntry expects and returns a PDFArray entry for given key.

func (PDFDict) PDFDictEntry added in v0.1.11

func (d PDFDict) PDFDictEntry(key string) *PDFDict

PDFDictEntry expects and returns a PDFDict entry for given key.

func (PDFDict) PDFHexLiteralEntry added in v0.1.11

func (d PDFDict) PDFHexLiteralEntry(key string) *PDFHexLiteral

PDFHexLiteralEntry returns a PDFHexLiteral object for given key.

func (PDFDict) PDFNameEntry added in v0.1.11

func (d PDFDict) PDFNameEntry(key string) *PDFName

PDFNameEntry returns a PDFName object for given key.

func (PDFDict) PDFStreamDictEntry added in v0.1.11

func (d PDFDict) PDFStreamDictEntry(key string) *PDFStreamDict

PDFStreamDictEntry expects and returns a PDFStreamDict entry for given key. unused.

func (PDFDict) PDFString added in v0.1.11

func (d PDFDict) PDFString() string

PDFString returns a string representation as found in and written to a PDF file.

func (PDFDict) PDFStringLiteralEntry added in v0.1.11

func (d PDFDict) PDFStringLiteralEntry(key string) *PDFStringLiteral

PDFStringLiteralEntry returns a PDFStringLiteral object for given key.

func (PDFDict) Prev added in v0.1.11

func (d PDFDict) Prev() *int64

Prev returns the previous offset.

func (PDFDict) Size added in v0.1.11

func (d PDFDict) Size() *int

Size returns the value of the int entry for key "Size"

func (PDFDict) String added in v0.1.11

func (d PDFDict) String() string

func (PDFDict) StringEntry added in v0.1.11

func (d PDFDict) StringEntry(key string) *string

StringEntry expects and returns a PDFStringLiteral entry for given key. Unused.

func (PDFDict) StringEntryBytes added in v0.1.11

func (d PDFDict) StringEntryBytes(key string) ([]byte, error)

StringEntryBytes returns the byte slice representing the string value for key.

func (PDFDict) Subtype added in v0.1.11

func (d PDFDict) Subtype() *string

Subtype returns the value of the name entry for key "Subtype".

func (PDFDict) Type added in v0.1.11

func (d PDFDict) Type() *string

Type returns the value of the name entry for key "Type".

func (*PDFDict) Update added in v0.1.11

func (d *PDFDict) Update(key string, value PDFObject)

Update modifies an existing entry of this PDFDict.

func (PDFDict) W added in v0.1.11

func (d PDFDict) W() *PDFArray

W returns a *PDFArray for key "W".

type PDFFilter added in v0.1.11

type PDFFilter struct {
	Name        string
	DecodeParms *PDFDict
}

PDFFilter represents a PDF stream filter object.

type PDFFloat added in v0.1.11

type PDFFloat float64

PDFFloat represents a PDF float object.

func (PDFFloat) PDFString added in v0.1.11

func (f PDFFloat) PDFString() string

PDFString returns a string representation as found in and written to a PDF file.

func (PDFFloat) String added in v0.1.11

func (f PDFFloat) String() string

func (PDFFloat) Value added in v0.1.11

func (f PDFFloat) Value() float64

Value returns a float64 value for this PDF object.

type PDFHexLiteral added in v0.1.11

type PDFHexLiteral string

PDFHexLiteral represents a PDF hex literal object.

func (PDFHexLiteral) Bytes added in v0.1.11

func (hexliteral PDFHexLiteral) Bytes() ([]byte, error)

Bytes returns the byte representation.

func (PDFHexLiteral) PDFString added in v0.1.11

func (hexliteral PDFHexLiteral) PDFString() string

PDFString returns the string representation as found in and written to a PDF file.

func (PDFHexLiteral) String added in v0.1.11

func (hexliteral PDFHexLiteral) String() string

func (PDFHexLiteral) Value added in v0.1.11

func (hexliteral PDFHexLiteral) Value() string

Value returns a string value for this PDF object.

type PDFIndirectRef added in v0.1.11

type PDFIndirectRef struct {
	ObjectNumber     PDFInteger
	GenerationNumber PDFInteger
}

PDFIndirectRef represents a PDF indirect object.

func NewPDFIndirectRef added in v0.1.11

func NewPDFIndirectRef(objectNumber, generationNumber int) *PDFIndirectRef

NewPDFIndirectRef returns a new PDFIndirectRef object.

func (PDFIndirectRef) Equals added in v0.1.11

func (ir PDFIndirectRef) Equals(indRef PDFIndirectRef) bool

Equals returns true if two indirect References refer to the same object.

func (PDFIndirectRef) PDFString added in v0.1.11

func (ir PDFIndirectRef) PDFString() string

PDFString returns a string representation as found in and written to a PDF file.

func (PDFIndirectRef) String added in v0.1.11

func (ir PDFIndirectRef) String() string

type PDFInteger added in v0.1.11

type PDFInteger int

PDFInteger represents a PDF integer object.

func (PDFInteger) PDFString added in v0.1.11

func (i PDFInteger) PDFString() string

PDFString returns a string representation as found in and written to a PDF file.

func (PDFInteger) String added in v0.1.11

func (i PDFInteger) String() string

func (PDFInteger) Value added in v0.1.11

func (i PDFInteger) Value() int

Value returns an int value for this PDF object.

type PDFName added in v0.1.11

type PDFName string

PDFName represents a PDF name object.

func (PDFName) PDFString added in v0.1.11

func (nameObject PDFName) PDFString() string

PDFString returns a string representation as found in and written to a PDF file.

func (PDFName) String added in v0.1.11

func (nameObject PDFName) String() string

func (PDFName) Value added in v0.1.11

func (nameObject PDFName) Value() string

Value returns a string value for this PDF object.

type PDFObject added in v0.1.11

type PDFObject interface {
	fmt.Stringer
	PDFString() string
}

PDFObject defines an interface for all PDFObjects.

type PDFObjectStreamDict added in v0.1.11

type PDFObjectStreamDict struct {
	PDFStreamDict
	Prolog         []byte
	ObjCount       int
	FirstObjOffset int
	ObjArray       PDFArray
}

PDFObjectStreamDict represents a object stream dictionary.

func NewPDFObjectStreamDict added in v0.1.11

func NewPDFObjectStreamDict() *PDFObjectStreamDict

NewPDFObjectStreamDict creates a new PDFObjectStreamDict object.

func (*PDFObjectStreamDict) AddObject added in v0.1.11

func (oStreamDict *PDFObjectStreamDict) AddObject(objNumber int, entry *XRefTableEntry) error

AddObject adds another object to this object stream. Relies on decoded content!

func (*PDFObjectStreamDict) Finalize added in v0.1.11

func (oStreamDict *PDFObjectStreamDict) Finalize()

Finalize prepares the final content of the objectstream.

func (*PDFObjectStreamDict) IndexedObject added in v0.1.11

func (oStreamDict *PDFObjectStreamDict) IndexedObject(index int) (PDFObject, error)

IndexedObject returns the object at given index from a PDFObjectStreamDict.

type PDFStats added in v0.1.11

type PDFStats struct {
	// contains filtered or unexported fields
}

PDFStats is a container for stats.

func NewPDFStats added in v0.1.11

func NewPDFStats() PDFStats

NewPDFStats returns a new PDFStats object.

func (PDFStats) AddPageAttr added in v0.1.11

func (stats PDFStats) AddPageAttr(name int)

AddPageAttr adds the occurrence of a field with given name to the pageAttrs set.

func (PDFStats) AddRootAttr added in v0.1.11

func (stats PDFStats) AddRootAttr(name int)

AddRootAttr adds the occurrence of a field with given name to the rootAttrs set.

func (PDFStats) UsesPageAttr added in v0.1.11

func (stats PDFStats) UsesPageAttr(name int) bool

UsesPageAttr returns true if a field with given name is contained in the pageAttrs set.

func (PDFStats) UsesRootAttr added in v0.1.11

func (stats PDFStats) UsesRootAttr(name int) bool

UsesRootAttr returns true if a field with given name is contained in the rootAttrs set.

type PDFStreamDict added in v0.1.11

type PDFStreamDict struct {
	PDFDict
	StreamOffset      int64
	StreamLength      *int64
	StreamLengthObjNr *int
	FilterPipeline    []PDFFilter
	Raw               []byte // Encoded
	Content           []byte // Decoded
	IsPageContent     bool
}

PDFStreamDict represents a PDF stream dict object.

func NewPDFStreamDict added in v0.1.11

func NewPDFStreamDict(pdfDict PDFDict, streamOffset int64, streamLength *int64, streamLengthObjNr *int,
	filterPipeline []PDFFilter) PDFStreamDict

NewPDFStreamDict creates a new PDFStreamDict for given PDFDict, stream offset and length.

func (PDFStreamDict) HasSoleFilterNamed added in v0.1.11

func (streamDict PDFStreamDict) HasSoleFilterNamed(filterName string) bool

HasSoleFilterNamed returns true if there is exactly one filter defined for a stream dict.

type PDFStringLiteral added in v0.1.11

type PDFStringLiteral string

PDFStringLiteral represents a PDF string literal object.

func DateStringLiteral added in v0.1.11

func DateStringLiteral(t time.Time) PDFStringLiteral

DateStringLiteral returns a PDFStringLiteral for time.

func (PDFStringLiteral) PDFString added in v0.1.11

func (stringliteral PDFStringLiteral) PDFString() string

PDFString returns a string representation as found in and written to a PDF file.

func (PDFStringLiteral) String added in v0.1.11

func (stringliteral PDFStringLiteral) String() string

func (PDFStringLiteral) Value added in v0.1.11

func (stringliteral PDFStringLiteral) Value() string

Value returns a string value for this PDF object.

type PDFVersion added in v0.1.11

type PDFVersion int

PDFVersion is a type for the internal representation of PDF versions.

const (
	V10 PDFVersion = iota
	V11
	V12
	V13
	V14
	V15
	V16
	V17
)

Constants for all PDF versions up to v1.7

func Version added in v0.1.11

func Version(versionStr string) (PDFVersion, error)

Version returns the PDFVersion for a version string.

type PDFXRefStreamDict added in v0.1.11

type PDFXRefStreamDict struct {
	PDFStreamDict
	Size           int
	Objects        []int
	W              [3]int
	PreviousOffset *int64
}

PDFXRefStreamDict represents a cross reference stream dictionary.

func NewPDFXRefStreamDict added in v0.1.11

func NewPDFXRefStreamDict(ctx *PDFContext) *PDFXRefStreamDict

NewPDFXRefStreamDict creates a new PDFXRefStreamDict object.

type ReadContext added in v0.1.11

type ReadContext struct {

	// The PDF-File which gets processed.
	FileName string
	File     *os.File
	FileSize int64

	BinaryTotalSize     int64 // total stream data
	BinaryImageSize     int64 // total image stream data
	BinaryFontSize      int64 // total font stream data (fontfiles)
	BinaryImageDuplSize int64 // total obsolet image stream data after optimization
	BinaryFontDuplSize  int64 // total obsolet font stream data after optimization

	Linearized bool // File is linearized.
	Hybrid     bool // File is a hybrid PDF file.

	UsingObjectStreams bool   // File is using object streams.
	ObjectStreams      IntSet // All object numbers of any object streams found which need to be decoded.

	UsingXRefStreams bool   // File is using xref streams.
	XRefStreams      IntSet // All object numbers of any xref streams found.
}

ReadContext represents the context for reading a PDF file.

func (*ReadContext) IsObjectStreamObject added in v0.1.11

func (rc *ReadContext) IsObjectStreamObject(i int) bool

IsObjectStreamObject returns true if object i is a an object stream. All compressed objects are object streams.

func (*ReadContext) IsXRefStreamObject added in v0.1.11

func (rc *ReadContext) IsXRefStreamObject(i int) bool

IsXRefStreamObject returns true if object #i is a an xref stream.

func (*ReadContext) LogStats added in v0.1.11

func (rc *ReadContext) LogStats(optimized bool)

LogStats logs stats for read file.

func (*ReadContext) ObjectStreamsString added in v0.1.11

func (rc *ReadContext) ObjectStreamsString() (int, string)

ObjectStreamsString returns a formatted string and the number of object stream objects.

func (*ReadContext) XRefStreamsString added in v0.1.11

func (rc *ReadContext) XRefStreamsString() (int, string)

XRefStreamsString returns a formatted string and the number of xref stream objects.

type StringSet added in v0.1.11

type StringSet map[string]bool

StringSet is a set of strings.

type WriteContext added in v0.1.11

type WriteContext struct {

	// The PDF-File which gets generated.
	DirName  string
	FileName string
	FileSize int64
	*bufio.Writer

	Command       string // command in effect.
	ExtractPageNr int    // page to be generated for rendering a single-page/PDF.
	ExtractPages  IntSet // pages to be generated for a trimmed PDF.

	BinaryTotalSize int64 // total stream data, counts 100% all stream data written.
	BinaryImageSize int64 // total image stream data written = Read.BinaryImageSize.
	BinaryFontSize  int64 // total font stream data (fontfiles) = copy of Read.BinaryFontSize.

	Table  map[int]int64 // object write offsets
	Offset int64         // current write offset

	WriteToObjectStream bool // if true start to embed objects into object streams and obey ObjectStreamMaxObjects.
	CurrentObjStream    *int // if not nil, any new non-stream-object gets added to the object stream with this object number.

	Eol string // end of line char sequence
}

WriteContext represents the context for writing a PDF file.

func NewWriteContext added in v0.1.11

func NewWriteContext(eol string) *WriteContext

NewWriteContext returns a new WriteContext.

func (*WriteContext) ExtractPage added in v0.1.11

func (wc *WriteContext) ExtractPage(i int) bool

ExtractPage returns true if page i needs to be generated.

func (*WriteContext) HasWriteOffset added in v0.1.11

func (wc *WriteContext) HasWriteOffset(objNumber int) bool

HasWriteOffset returns true if an object has already been written to PDFDestination.

func (*WriteContext) LogStats added in v0.1.11

func (wc *WriteContext) LogStats()

LogStats logs stats for written file.

func (*WriteContext) ReducedFeatureSet added in v0.1.11

func (wc *WriteContext) ReducedFeatureSet() bool

ReducedFeatureSet returns true for Split,Trim,Merge,ExtractPages. Don't confuse with pdfcpu commands, these are internal triggers.

func (*WriteContext) SetWriteOffset added in v0.1.11

func (wc *WriteContext) SetWriteOffset(objNumber int)

SetWriteOffset saves the current write offset to the PDFDestination.

func (*WriteContext) WriteEol added in v0.1.11

func (wc *WriteContext) WriteEol() error

WriteEol writes an end of line sequence.

type XRefTable added in v0.1.11

type XRefTable struct {
	Table               map[int]*XRefTableEntry
	Size                *int             // Object count from PDF trailer dict.
	PageCount           int              // Number of pages, set during validation.
	Root                *PDFIndirectRef  // Pointer to catalog (reference to root object).
	RootDict            *PDFDict         // Catalog
	Names               map[string]*Node // Cache for name trees as found in catalog.
	Encrypt             *PDFIndirectRef  // Encrypt dict.
	E                   *Enc
	EncKey              []byte // Encrypt key.
	AES4Strings         bool
	AES4Streams         bool
	AES4EmbeddedStreams bool

	// PDF Version
	HeaderVersion *PDFVersion // The PDF version the source is claiming to us as per its header.
	RootVersion   *PDFVersion // Optional PDF version taking precedence over the header version.

	// Document information section
	Info     *PDFIndirectRef // Infodict (reference to info dict object)
	ID       *PDFArray       // from info dict (or trailer?)
	Author   string
	Creator  string
	Producer string

	// Linearization section (not yet supported)
	OffsetPrimaryHintTable  *int64
	OffsetOverflowHintTable *int64
	LinearizationObjs       IntSet

	// Offspec section
	AdditionalStreams *PDFArray // array of PDFIndirectRef - trailer :e.g., Oasis "Open Doc"

	// Statistics
	Stats PDFStats

	Tagged bool // File is using tags. This is important for ???

	// Validation
	Valid          bool // true means successful validated against ISO 32000.
	ValidationMode int  // see Configuration

	Optimized bool
}

XRefTable represents a PDF cross reference table plus stats for a PDF file.

func (*XRefTable) BindNameTrees added in v0.1.11

func (xRefTable *XRefTable) BindNameTrees() error

BindNameTrees syncs up the internal name tree cache with the xreftable.

func (*XRefTable) Catalog added in v0.1.11

func (xRefTable *XRefTable) Catalog() (*PDFDict, error)

Catalog returns a pointer to the root object / catalog.

func (*XRefTable) CatalogHasPieceInfo added in v0.1.11

func (xRefTable *XRefTable) CatalogHasPieceInfo() (bool, error)

CatalogHasPieceInfo returns true if the root has an entry for \"PieceInfo\".

func (*XRefTable) DeleteObject added in v0.1.11

func (xRefTable *XRefTable) DeleteObject(objectNumber int) error

DeleteObject marks an object as free and inserts it into the free list right after the head.

func (*XRefTable) DeleteObjectGraph added in v0.1.11

func (xRefTable *XRefTable) DeleteObjectGraph(obj PDFObject) error

DeleteObjectGraph deletes all objects reachable by indRef.

func (*XRefTable) Dereference added in v0.1.11

func (xRefTable *XRefTable) Dereference(obj PDFObject) (PDFObject, error)

Dereference resolves an indirect object and returns the resulting PDF object.

func (*XRefTable) DereferenceArray added in v0.1.11

func (xRefTable *XRefTable) DereferenceArray(obj PDFObject) (*PDFArray, error)

DereferenceArray resolves and validates an array object, which may be an indirect reference.

func (*XRefTable) DereferenceDict added in v0.1.11

func (xRefTable *XRefTable) DereferenceDict(obj PDFObject) (*PDFDict, error)

DereferenceDict resolves and validates a dictionary object, which may be an indirect reference.

func (*XRefTable) DereferenceInteger added in v0.1.11

func (xRefTable *XRefTable) DereferenceInteger(obj PDFObject) (*PDFInteger, error)

DereferenceInteger resolves and validates an integer object, which may be an indirect reference.

func (*XRefTable) DereferenceName added in v0.1.11

func (xRefTable *XRefTable) DereferenceName(obj PDFObject, sinceVersion PDFVersion, validate func(string) bool) (n PDFName, err error)

DereferenceName resolves and validates a name object, which may be an indirect reference.

func (*XRefTable) DereferenceStreamDict added in v0.1.11

func (xRefTable *XRefTable) DereferenceStreamDict(obj PDFObject) (*PDFStreamDict, error)

DereferenceStreamDict resolves and validates a stream dictionary object, which may be an indirect reference.

func (*XRefTable) DereferenceStringLiteral added in v0.1.11

func (xRefTable *XRefTable) DereferenceStringLiteral(obj PDFObject, sinceVersion PDFVersion, validate func(string) bool) (s PDFStringLiteral, err error)

DereferenceStringLiteral resolves and validates a string literal object, which may be an indirect reference.

func (*XRefTable) DereferenceStringOrHexLiteral added in v0.1.11

func (xRefTable *XRefTable) DereferenceStringOrHexLiteral(obj PDFObject, sinceVersion PDFVersion, validate func(string) bool) (o PDFObject, err error)

DereferenceStringOrHexLiteral resolves and validates a string or hex literal object, which may be an indirect reference.

func (*XRefTable) EncryptDict added in v0.1.11

func (xRefTable *XRefTable) EncryptDict() (*PDFDict, error)

EncryptDict returns a pointer to the root object / catalog.

func (*XRefTable) EnsureCollection added in v0.1.11

func (xRefTable *XRefTable) EnsureCollection() error

EnsureCollection makes sure there is a Collection entry in the catalog. Needed for portfolio / portable collections eg. for file attachments.

func (*XRefTable) EnsureValidFreeList added in v0.1.11

func (xRefTable *XRefTable) EnsureValidFreeList() error

EnsureValidFreeList ensures the integrity of the free list associated with the recorded free objects. See 7.5.4 Cross-Reference Table

func (*XRefTable) Exists added in v0.1.11

func (xRefTable *XRefTable) Exists(objNumber int) bool

Exists returns true if xRefTable contains an entry for objNumber.

func (*XRefTable) Find added in v0.1.11

func (xRefTable *XRefTable) Find(objNumber int) (*XRefTableEntry, bool)

Find returns the XRefTable entry for given object number.

func (*XRefTable) FindObject added in v0.1.11

func (xRefTable *XRefTable) FindObject(objNumber int) (PDFObject, error)

FindObject returns the object of the XRefTableEntry for a specific object number.

func (*XRefTable) FindTableEntry added in v0.1.11

func (xRefTable *XRefTable) FindTableEntry(objNumber int, generationNumber int) (*XRefTableEntry, bool)

FindTableEntry returns the XRefTable entry for given object and generation numbers.

func (*XRefTable) FindTableEntryForIndRef added in v0.1.11

func (xRefTable *XRefTable) FindTableEntryForIndRef(indRef *PDFIndirectRef) (*XRefTableEntry, bool)

FindTableEntryForIndRef returns the XRefTable entry for given indirect reference.

func (*XRefTable) FindTableEntryLight added in v0.1.11

func (xRefTable *XRefTable) FindTableEntryLight(objNumber int) (*XRefTableEntry, bool)

FindTableEntryLight returns the XRefTable entry for given object number.

func (*XRefTable) Free added in v0.1.11

func (xRefTable *XRefTable) Free(objNumber int) (*XRefTableEntry, error)

Free returns the cross ref table entry for given number of a free object.

func (*XRefTable) IDFirstElement added in v0.1.11

func (xRefTable *XRefTable) IDFirstElement() (id []byte, err error)

IDFirstElement returns the first element of ID.

func (*XRefTable) IndRefForNewObject added in v0.1.11

func (xRefTable *XRefTable) IndRefForNewObject(obj PDFObject) (*PDFIndirectRef, error)

IndRefForNewObject inserts an object into the xRefTable and returns an indirect reference to it.

func (*XRefTable) InsertAndUseRecycled added in v0.1.11

func (xRefTable *XRefTable) InsertAndUseRecycled(xRefTableEntry XRefTableEntry) (objNumber int, err error)

InsertAndUseRecycled adds given xRefTableEntry into the cross reference table utilizing the freelist.

func (*XRefTable) InsertNew added in v0.1.11

func (xRefTable *XRefTable) InsertNew(xRefTableEntry XRefTableEntry) (objNumber int)

InsertNew adds given xRefTableEntry at next new objNumber into the cross reference table. Only to be called once an xRefTable has been generated completely and all trailer dicts have been processed. xRefTable.Size is the size entry of the first trailer dict processed. Called on creation of new object streams. Called by InsertAndUseRecycled.

func (*XRefTable) InsertObject added in v0.1.11

func (xRefTable *XRefTable) InsertObject(obj PDFObject) (objNumber int, err error)

InsertObject inserts an object into the xRefTable.

func (*XRefTable) IsLinearizationObject added in v0.1.11

func (xRefTable *XRefTable) IsLinearizationObject(i int) bool

IsLinearizationObject returns true if object #i is a a linearization object.

func (*XRefTable) LinearizationObjsString added in v0.1.11

func (xRefTable *XRefTable) LinearizationObjsString() (int, string)

LinearizationObjsString returns a formatted string and the number of objs.

func (*XRefTable) LocateNameTree added in v0.1.11

func (xRefTable *XRefTable) LocateNameTree(nameTreeName string, ensure bool) error

LocateNameTree locates/ensures a specific name tree.

func (*XRefTable) MissingObjects added in v0.1.11

func (xRefTable *XRefTable) MissingObjects() (int, *string)

MissingObjects returns the number of objects that were not written plus the corresponding comma separated string representation.

func (*XRefTable) NamesDict added in v0.1.11

func (xRefTable *XRefTable) NamesDict() (*PDFDict, error)

NamesDict returns the dict that contains all name trees.

func (*XRefTable) NewEmbeddedFileStreamDict added in v0.1.11

func (xRefTable *XRefTable) NewEmbeddedFileStreamDict(filename string) (*PDFStreamDict, error)

NewEmbeddedFileStreamDict creates and returns an embeddedFileStreamDict containing the file "filename".

func (*XRefTable) NewFileSpecDict added in v0.1.11

func (xRefTable *XRefTable) NewFileSpecDict(filename string, indRefStreamDict PDFIndirectRef) (*PDFDict, error)

NewFileSpecDict creates and returns a new fileSpec dictionary.

func (*XRefTable) NewPDFStreamDict added in v0.1.11

func (xRefTable *XRefTable) NewPDFStreamDict(filename string) (*PDFStreamDict, error)

NewPDFStreamDict creates a streamDict for buf.

func (*XRefTable) NewSoundStreamDict added in v0.1.11

func (xRefTable *XRefTable) NewSoundStreamDict(filename string, samplingRate int, fileSpecDict *PDFDict) (*PDFStreamDict, error)

NewSoundStreamDict returns a new sound stream dict.

func (*XRefTable) NextForFree added in v0.1.11

func (xRefTable *XRefTable) NextForFree(objNumber int) (int, error)

NextForFree returns the number of the object the free object with objNumber links to. This is the successor of this free object in the free list.

func (*XRefTable) PageDict added in v0.1.11

func (xRefTable *XRefTable) PageDict(page int) (*PDFDict, error)

PageDict returns a specific page dict.

func (*XRefTable) Pages added in v0.1.11

func (xRefTable *XRefTable) Pages() (*PDFIndirectRef, error)

Pages returns the Pages reference contained in the catalog.

func (*XRefTable) ParseRootVersion added in v0.1.11

func (xRefTable *XRefTable) ParseRootVersion() (v *string, err error)

ParseRootVersion returns a string representation for an optional Version entry in the root object.

func (*XRefTable) RemoveCollection added in v0.1.11

func (xRefTable *XRefTable) RemoveCollection() error

RemoveCollection removes an existing Collection entry from the catalog.

func (*XRefTable) RemoveEmbeddedFilesNameTree added in v0.1.11

func (xRefTable *XRefTable) RemoveEmbeddedFilesNameTree() error

RemoveEmbeddedFilesNameTree removes both the embedded files name tree and the Collection dict.

func (*XRefTable) RemoveNameTree added in v0.1.11

func (xRefTable *XRefTable) RemoveNameTree(nameTreeName string) error

RemoveNameTree removes a specific name tree. Also removes a resulting empty names dict.

func (*XRefTable) UndeleteObject added in v0.1.11

func (xRefTable *XRefTable) UndeleteObject(objectNumber int) error

UndeleteObject ensures an object is not recorded in the free list. e.g. sometimes caused by indirect references to free objects in the original PDF file.

func (*XRefTable) ValidateVersion added in v0.1.11

func (xRefTable *XRefTable) ValidateVersion(element string, sinceVersion PDFVersion) error

ValidateVersion validates against the xRefTable's version.

func (*XRefTable) Version added in v0.1.11

func (xRefTable *XRefTable) Version() PDFVersion

Version returns the PDF version of the PDF writer that created this file. Before V1.4 this is the header version. Since V1.4 the catalog may contain a Version entry which takes precedence over the header version.

func (*XRefTable) VersionString added in v0.1.11

func (xRefTable *XRefTable) VersionString() string

VersionString return a string representation for this PDF files PDF version.

type XRefTableEntry added in v0.1.11

type XRefTableEntry struct {
	Free            bool
	Offset          *int64
	Generation      *int
	Object          PDFObject
	Compressed      bool
	ObjectStream    *int
	ObjectStreamInd *int
}

XRefTableEntry represents an entry in the PDF cross reference table.

This may wrap a free object, a compressed object or any in use PDF object:

PDFDict, PDFStreamDict, PDFObjectStreamDict, PDFXRefStreamDict, PDFArray, PDFInteger, PDFFloat, PDFName, PDFStringLiteral, PDFHexLiteral, PDFBoolean

func NewFreeHeadXRefTableEntry added in v0.1.11

func NewFreeHeadXRefTableEntry() *XRefTableEntry

NewFreeHeadXRefTableEntry returns the xref table entry for object 0 which is per definition the head of the free list (list of free objects).

func NewXRefTableEntryGen0 added in v0.1.11

func NewXRefTableEntryGen0(obj PDFObject) *XRefTableEntry

NewXRefTableEntryGen0 returns a cross reference table entry for an object with generation 0.

Directories

Path Synopsis
cmd
Package filter contains PDF filter implementations.
Package filter contains PDF filter implementations.
Package log provides a logging abstraction.
Package log provides a logging abstraction.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL