pdfcpu

package module
v0.1.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 31, 2018 License: MIT Imports: 19 Imported by: 0

README

pdfcpu: a golang pdf processor

Build Status GoDoc Coverage Status Go Report Card

logo

Package pdfcpu is a simple PDF processing library written in Go supporting encryption. It provides both an API and a CLI. Supported are all versions up to PDF 1.7 (ISO-32000).

Motivation

Reducing the size of large PDF files for mass mailings by optimization to the bare minimum. This can be achieved by analyzing a PDF's cross reference table, removing redundant embedded resources like font files or images and by always writing back the file maxing out PDF compression. I also wanted to have my own swiss army knife for PDFs written entirely in Go that allows me to trim, split and merge PDF content.

Features

  • Validate (validates PDF files up to version 7.0)
  • Read (builds xref table from PDF file)
  • Write (writes xref table to PDF file)
  • Optimize (gets rid of redundancies like duplicate fonts, images)
  • Split (split a multi page PDF file into single page PDF files)
  • Merge (a set of PDF files into one consolidated PDF file)
  • Extract Images (extract all embedded images of a PDF file into a given dir)
  • Extract Fonts (extract all embedded fonts of a PDF file into a given dir)
  • Extract Pages (extract specific pages into a given dir)
  • Extract Content (extract the PDF-Source into given dir)
  • Trim (generate a custom version of a PDF file)
  • Manage (add,remove,list,extract) embedded file attachments
  • Encrypt (sets password protection)
  • Decrypt (removes password protection)
  • Change user/owner password
  • Manage (add,list) user access permissions

Demo Screencast

asciicast

Installation

Required build version: go1.8 and up

go get github.com/hhrutter/pdfcpu/cmd/...

Usage

pdfcpu validate [-verbose] [-mode strict|relaxed] [-upw userpw] [-opw ownerpw] inFile
pdfcpu optimize [-verbose] [-stats csvFile] [-upw userpw] [-opw ownerpw] inFile [outFile]
pdfcpu split [-verbose] [-upw userpw] [-opw ownerpw] inFile outDir
pdfcpu merge [-verbose] outFile inFile...
pdfcpu extract [-verbose] -mode image|font|content|page [-pages pageSelection] [-upw userpw] [-opw ownerpw] inFile outDir
pdfcpu trim [-verbose] -pages pageSelection [-upw userpw] [-opw ownerpw] inFile outFile

pdfcpu attach list [-verbose] [-upw userpw] [-opw ownerpw] inFile
pdfcpu attach add [-verbose] [-upw userpw] [-opw ownerpw] inFile file...
pdfcpu attach remove [-verbose] [-upw userpw] [-opw ownerpw] inFile [file...]
pdfcpu attach extract [-verbose] [-upw userpw] [-opw ownerpw] inFile outDir [file...]

pdfcpu encrypt [-verbose] [-mode rc4|aes] [-key 40|128] [-perm none|all] [-upw userpw] [-opw ownerpw] inFile [outFile]
pdfcpu decrypt [-verbose] [-upw userpw] [-opw ownerpw] inFile [outFile]
pdfcpu changeupw [-verbose] [-opw ownerpw] inFile upwOld upwNew
pdfcpu changeopw [-verbose] [-upw userpw] inFile opwOld opwNew

pdfcpu perm list [-verbose] [-upw userpw] [-opw ownerpw] inFile
pdfcpu perm add [-verbose] [-perm none|all] [-upw userpw] -opw ownerpw inFile

pdfcpu version

Please read the documentation

Status

Version: 0.1.9

  • Redesigned extraction API with focus on returning the extracted data rather than writing it somewhere.
  • It is up to the API consumer how to process the extracted data.
func ImageData(ctx *types.PDFContext, objNr int) (*types.ImageObject, error)
func FontData(ctx *types.PDFContext, objNr int) (*types.FontObject, error)
func ContentData(ctx *types.PDFContext, objNr int) (data []byte, err error)

Contributing

  • Please open an issue if you find a bug or want to propose a change.
  • Pull requests, bug fixes and issues are always welcome.

Disclaimer

Usage of pdfcpu assumes you know about and respect all copyrights of any PDF content you may be processing. This applies to the PDF files as such, their content and in particular all embedded resources like font files or images. Credit goes to Renee French for creating our beloved Gopher.

License

MIT

Documentation

Overview

Package pdfcpu is a simple PDF processing library written in Go supporting encryption. It provides an API and a command line interface. Supported are all versions up to PDF 1.7 (ISO-32000).

The available commands are:

validate	validate PDF against PDF 32000-1:2008 (PDF 1.7)
optimize	optimize PDF by getting rid of redundant page resources
split		split multi-page PDF into several single-page PDFs
merge		concatenate 2 or more PDFs
extract		extract images, fonts, content or pages
trim		create trimmed version
attach		list, add, remove, extract embedded file attachments
perm		list, add user access permissions
encrypt		set password protection
decrypt		remove password protection
changeupw	change user password
changeopw	change owner password
version		print version

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func AddAttachments added in v0.1.3

func AddAttachments(fileIn string, files []string, config *types.Configuration) error

AddAttachments embeds files into a PDF.

func AddPermissions added in v0.1.6

func AddPermissions(fileIn string, config *types.Configuration) error

AddPermissions sets the user access permissions.

func ChangeOwnerPassword added in v0.1.1

func ChangeOwnerPassword(fileIn, fileOut string, config *types.Configuration, pwOld, pwNew *string) error

ChangeOwnerPassword of fileIn and write result to fileOut.

func ChangeUserPassword added in v0.1.1

func ChangeUserPassword(fileIn, fileOut string, config *types.Configuration, pwOld, pwNew *string) error

ChangeUserPassword of fileIn and write result to fileOut.

func Decrypt added in v0.1.1

func Decrypt(fileIn, fileOut string, config *types.Configuration) error

Decrypt fileIn and write result to fileOut.

func Encrypt added in v0.1.1

func Encrypt(fileIn, fileOut string, config *types.Configuration) error

Encrypt fileIn and write result to fileOut.

func ExtractAttachments added in v0.1.3

func ExtractAttachments(fileIn, dirOut string, files []string, config *types.Configuration) error

ExtractAttachments extracts embedded files from a PDF.

func ExtractContent

func ExtractContent(fileIn, dirOut string, pageSelection []string, config *types.Configuration) error

ExtractContent dumps "PDF source" files from fileIn into dirOut for selected pages.

func ExtractFonts

func ExtractFonts(fileIn, dirOut string, pageSelection []string, config *types.Configuration) error

ExtractFonts dumps embedded fontfiles from fileIn into dirOut for selected pages.

func ExtractImages

func ExtractImages(fileIn, dirOut string, pageSelection []string, config *types.Configuration) error

ExtractImages dumps embedded image resources from fileIn into dirOut for selected pages.

func ExtractPages

func ExtractPages(fileIn, dirOut string, pageSelection []string, config *types.Configuration) error

ExtractPages generates single page PDF files from fileIn in dirOut for selected pages.

func ListAttachments added in v0.1.3

func ListAttachments(fileIn string, config *types.Configuration) ([]string, error)

ListAttachments returns a list of embedded file attachments.

func ListPermissions added in v0.1.6

func ListPermissions(fileIn string, config *types.Configuration) ([]string, error)

ListPermissions returns a list of user access permissions.

func Merge

func Merge(filesIn []string, fileOut string, config *types.Configuration) error

Merge some PDF files together and write the result to fileOut. This corresponds to concatenating these files in the order specified by filesIn. The first entry of filesIn serves as the destination xRefTable where all the remaining files gets merged into.

func Optimize

func Optimize(fileIn, fileOut string, config *types.Configuration) error

Optimize reads in fileIn, does validation, optimization and writes the result to fileOut.

func ParsePageSelection

func ParsePageSelection(s string) ([]string, error)

ParsePageSelection ensures a correct page selection expression.

func Process

func Process(cmd *Command) (out []string, err error)

Process executes a pdfcpu command.

Example (AddAttachments)
config := types.NewDefaultConfiguration()

// Set optional password(s).
//config.UserPW = "upw"
//config.OwnerPW = "opw"

_, err := Process(AddAttachmentsCommand("in.pdf", []string{"a.csv", "b.jpg", "c.pdf"}, config))
if err != nil {
	return
}
Output:

Example (AddPermissions)
config := types.NewDefaultConfiguration()
config.UserPW = "upw"
config.OwnerPW = "opw"

config.UserAccessPermissions = types.PermissionsAll

_, err := Process(AddPermissionsCommand("in.pdf", config))
if err != nil {
	return
}
Output:

Example (ChangeOwnerPW)
config := types.NewDefaultConfiguration()

// supply existing user pw like so
config.UserPW = "upw"

// old and new owner pw
pwOld := "pwOld"
pwNew := "pwNew"

_, err := Process(ChangeOwnerPWCommand("in.pdf", "out.pdf", config, &pwOld, &pwNew))
if err != nil {
	return
}
Output:

Example (ChangeUserPW)
config := types.NewDefaultConfiguration()

// supply existing owner pw like so
config.OwnerPW = "opw"

pwOld := "pwOld"
pwNew := "pwNew"

_, err := Process(ChangeUserPWCommand("in.pdf", "out.pdf", config, &pwOld, &pwNew))
if err != nil {
	return
}
Output:

Example (Decrypt)
config := types.NewDefaultConfiguration()

config.UserPW = "upw"
config.OwnerPW = "opw"

_, err := Process(DecryptCommand("in.pdf", "out.pdf", config))
if err != nil {
	return
}
Output:

Example (Encrypt)
config := types.NewDefaultConfiguration()

config.UserPW = "upw"
config.OwnerPW = "opw"

_, err := Process(EncryptCommand("in.pdf", "out.pdf", config))
if err != nil {
	return
}
Output:

Example (ExtractAttachments)
config := types.NewDefaultConfiguration()

// Set optional password(s).
//config.UserPW = "upw"
//config.OwnerPW = "opw"

// Extract all attachments.
_, err := Process(ExtractAttachmentsCommand("in.pdf", "dirOut", nil, config))
if err != nil {
	return
}

// Extract specific attachments.
_, err = Process(ExtractAttachmentsCommand("in.pdf", "dirOut", []string{"a.csv", "b.pdf"}, config))
if err != nil {
	return
}
Output:

Example (ExtractImages)
// Extract all embedded images for first 5 and last 5 pages but not for page 4.
selectedPages := []string{"-5", "5-", "!4"}

config := types.NewDefaultConfiguration()

// Set optional password(s).
//config.UserPW = "upw"
//config.OwnerPW = "opw"

_, err := Process(ExtractImagesCommand("in.pdf", "dirOut", selectedPages, config))
if err != nil {
	return
}
Output:

Example (ExtractPages)
// Extract single-page PDFs for pages 3, 4 and 5.
selectedPages := []string{"3..5"}

config := types.NewDefaultConfiguration()

// Set optional password(s).
//config.UserPW = "upw"
//config.OwnerPW = "opw"

_, err := Process(ExtractPagesCommand("in.pdf", "dirOut", selectedPages, config))
if err != nil {
	return
}
Output:

Example (ListAttachments)
config := types.NewDefaultConfiguration()

// Set optional password(s).
//config.UserPW = "upw"
//config.OwnerPW = opw"

list, err := Process(ListAttachmentsCommand("in.pdf", config))
if err != nil {
	return
}

// Print attachment list.
for _, l := range list {
	fmt.Println(l)
}
Output:

Example (ListPermissions)
config := types.NewDefaultConfiguration()
config.UserPW = "upw"
config.OwnerPW = "opw"

list, err := Process(ListPermissionsCommand("in.pdf", config))
if err != nil {
	return
}

// Print permissions list.
for _, l := range list {
	fmt.Println(l)
}
Output:

Example (Merge)
// Concatenate this sequence of PDF files:
filenamesIn := []string{"in1.pdf", "in2.pdf", "in3.pdf"}

_, err := Process(MergeCommand(filenamesIn, "out.pdf", types.NewDefaultConfiguration()))
if err != nil {
	return
}
Output:

Example (Optimize)
config := types.NewDefaultConfiguration()

// Set optional password(s).
//config.UserPW = "upw"
//config.OwnerPW = "opw"

// Generate optional stats.
config.StatsFileName = "stats.csv"

// Configure end of line sequence for writing.
config.Eol = types.EolLF

_, err := Process(OptimizeCommand("in.pdf", "out.pdf", config))
if err != nil {
	return
}
Output:

Example (RemoveAttachments)
config := types.NewDefaultConfiguration()

// Set optional password(s).
//config.UserPW = "upw"
//config.OwnerPW = "opw"

// Not to be confused with the ExtractAttachmentsCommand!

// Remove all attachments.
_, err := Process(RemoveAttachmentsCommand("in.pdf", nil, config))
if err != nil {
	return
}

// Remove specific attachments.
_, err = Process(RemoveAttachmentsCommand("in.pdf", []string{"a.csv", "b.jpg"}, config))
if err != nil {
	return
}
Output:

Example (Split)
config := types.NewDefaultConfiguration()

// Set optional password(s).
//config.UserPW = "upw"
//config.OwnerPW = "opw"

// Split into single-page PDFs.

_, err := Process(SplitCommand("in.pdf", "outDir", config))
if err != nil {
	return
}
Output:

Example (Trim)
// Trim to first three pages.
selectedPages := []string{"-3"}

config := types.NewDefaultConfiguration()

// Set optional password(s).
//config.UserPW = "upw"
//config.OwnerPW = "opw"

_, err := Process(TrimCommand("in.pdf", "out.pdf", selectedPages, config))
if err != nil {
	return
}
Output:

Example (Validate)
config := types.NewDefaultConfiguration()

// Set optional password(s).
//config.UserPW = "upw"
//config.OwnerPW = "opw"

// Set relaxed validation mode.
config.SetValidationRelaxed()

_, err := Process(ValidateCommand("in.pdf", config))
if err != nil {
	return
}
Output:

func Read

func Read(fileIn string, config *types.Configuration) (*types.PDFContext, error)

Read reads in a PDF file and builds an internal structure holding its cross reference table aka the PDFContext.

func RemoveAttachments added in v0.1.3

func RemoveAttachments(fileIn string, files []string, config *types.Configuration) error

RemoveAttachments deletes embedded files from a PDF.

func Split

func Split(fileIn, dirOut string, config *types.Configuration) error

Split generates a sequence of single page PDF files in dirOut creating one file for every page of inFile.

func Trim

func Trim(fileIn, fileOut string, pageSelection []string, config *types.Configuration) error

Trim generates a trimmed version of fileIn containing all pages selected.

func Validate

func Validate(fileIn string, config *types.Configuration) error

Validate validates a PDF file against ISO-32000-1:2008.

func Write

func Write(ctx *types.PDFContext) error

Write generates a PDF file for a given PDFContext.

Types

type Command

type Command struct {
	Mode          types.CommandMode    // VALIDATE  OPTIMIZE  SPLIT  MERGE  EXTRACT  TRIM  LISTATT ADDATT REMATT EXTATT  ENCRYPT  DECRYPT  CHANGEUPW  CHANGEOPW LISTP ADDP
	InFile        *string              //    *         *        *      -       *      *      *       *       *      *       *        *         *          *       *     *
	InFiles       []string             //    -         -        -      *       -      -      -       *       *      *       -        -         -          -       -     -
	InDir         *string              //    -         -        -      -       -      -      -       -       -      -       -        -         -          -       -     -
	OutFile       *string              //    -         *        -      *       -      *      -       -       -      -       *        *         *          *       -     -
	OutDir        *string              //    -         -        *      -       *      -      -       -       -      *       -        -         -          -       -     -
	PageSelection []string             //    -         -        -      -       *      *      -       -       -      -       -        -         -          -       -     -
	Config        *types.Configuration //    *         *        *      *       *      *      *       *       *      *       *        *         *          *       *     *
	PWOld         *string              //    -         -        -      -       -      -      -       -       -      -       -        -         *          *       -     -
	PWNew         *string              //    -         -        -      -       -      -      -       -       -      -       -        -         *          *       -     -
}

Command represents an execution context.

func AddAttachmentsCommand added in v0.1.3

func AddAttachmentsCommand(pdfFileNameIn string, fileNamesIn []string, config *types.Configuration) *Command

AddAttachmentsCommand creates a new AddAttachmentsCommand.

func AddPermissionsCommand added in v0.1.6

func AddPermissionsCommand(pdfFileNameIn string, config *types.Configuration) *Command

AddPermissionsCommand creates a new AddPermissionsCommand.

func ChangeOwnerPWCommand added in v0.1.1

func ChangeOwnerPWCommand(pdfFileNameIn, pdfFileNameOut string, config *types.Configuration, pwOld, pwNew *string) *Command

ChangeOwnerPWCommand creates a new ChangeOwnerPWCommand.

func ChangeUserPWCommand added in v0.1.1

func ChangeUserPWCommand(pdfFileNameIn, pdfFileNameOut string, config *types.Configuration, pwOld, pwNew *string) *Command

ChangeUserPWCommand creates a new ChangeUserPWCommand.

func DecryptCommand added in v0.1.1

func DecryptCommand(pdfFileNameIn, pdfFileNameOut string, config *types.Configuration) *Command

DecryptCommand creates a new DecryptCommand.

func EncryptCommand added in v0.1.1

func EncryptCommand(pdfFileNameIn, pdfFileNameOut string, config *types.Configuration) *Command

EncryptCommand creates a new EncryptCommand.

func ExtractAttachmentsCommand added in v0.1.3

func ExtractAttachmentsCommand(pdfFileNameIn, dirNameOut string, fileNamesIn []string, config *types.Configuration) *Command

ExtractAttachmentsCommand creates a new ExtractAttachmentsCommand.

func ExtractContentCommand

func ExtractContentCommand(pdfFileNameIn, dirNameOut string, pageSelection []string, config *types.Configuration) *Command

ExtractContentCommand creates a new ExtractContentCommand.

func ExtractFontsCommand

func ExtractFontsCommand(pdfFileNameIn, dirNameOut string, pageSelection []string, config *types.Configuration) *Command

ExtractFontsCommand creates a new ExtractFontsCommand. (experimental)

func ExtractImagesCommand

func ExtractImagesCommand(pdfFileNameIn, dirNameOut string, pageSelection []string, config *types.Configuration) *Command

ExtractImagesCommand creates a new ExtractImagesCommand. (experimental)

func ExtractPagesCommand

func ExtractPagesCommand(pdfFileNameIn, dirNameOut string, pageSelection []string, config *types.Configuration) *Command

ExtractPagesCommand creates a new ExtractPagesCommand.

func ListAttachmentsCommand added in v0.1.3

func ListAttachmentsCommand(pdfFileNameIn string, config *types.Configuration) *Command

ListAttachmentsCommand create a new ListAttachmentsCommand.

func ListPermissionsCommand added in v0.1.6

func ListPermissionsCommand(pdfFileNameIn string, config *types.Configuration) *Command

ListPermissionsCommand create a new ListPermissionsCommand.

func MergeCommand

func MergeCommand(pdfFileNamesIn []string, pdfFileNameOut string, config *types.Configuration) *Command

MergeCommand creates a new MergeCommand.

func OptimizeCommand

func OptimizeCommand(pdfFileNameIn, pdfFileNameOut string, config *types.Configuration) *Command

OptimizeCommand creates a new OptimizeCommand.

func RemoveAttachmentsCommand added in v0.1.3

func RemoveAttachmentsCommand(pdfFileNameIn string, fileNamesIn []string, config *types.Configuration) *Command

RemoveAttachmentsCommand creates a new RemoveAttachmentsCommand.

func SplitCommand

func SplitCommand(pdfFileNameIn, dirNameOut string, config *types.Configuration) *Command

SplitCommand creates a new SplitCommand.

func TrimCommand

func TrimCommand(pdfFileNameIn, pdfFileNameOut string, pageSelection []string, config *types.Configuration) *Command

TrimCommand creates a new TrimCommand.

func ValidateCommand

func ValidateCommand(pdfFileName string, config *types.Configuration) *Command

ValidateCommand creates a new ValidateCommand.

Directories

Path Synopsis
Package attach provides management code for file attachments / embedded files.
Package attach provides management code for file attachments / embedded files.
cmd
Package create contains primitives for generating a PDF file.
Package create contains primitives for generating a PDF file.
Package crypto contains PDF encryption code.
Package crypto contains PDF encryption code.
Package extract provides functions for extracting fonts, images, pages and page content.
Package extract provides functions for extracting fonts, images, pages and page content.
Package filter contains PDF filter implementations.
Package filter contains PDF filter implementations.
Package log provides a logging abstraction.
Package log provides a logging abstraction.
Package merge provides for the merging of two PDFContexts.
Package merge provides for the merging of two PDFContexts.
Package optimize contains code for optimizing the resources of a PDF file.
Package optimize contains code for optimizing the resources of a PDF file.
Package read provides for parsing a PDF file into memory.
Package read provides for parsing a PDF file into memory.
Package types provides the PDFContext, representing an ecosystem for PDF processing.
Package types provides the PDFContext, representing an ecosystem for PDF processing.
Package validate contains validation code for ISO 32000-1:2008.
Package validate contains validation code for ISO 32000-1:2008.
Package write renders a PDF cross reference table to a PDF file.
Package write renders a PDF cross reference table to a PDF file.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL