pdfium-cli
🚀 Easy to use PDF CLI tool powered by PDFium and go-pdfium 🚀
Features
- Get information of a PDF
- Merge multiple PDFs into a single PDF
- Exploding PDFs into one PDF file per page
- Rendering PDFs in JPG and PNG
- Extracting text from PDFs
- Extracting images from PDFs
- Extracting attachments from PDFs
- Extracting thumbnails from PDFs
- Extracting JavaScripts from PDFs
- Piping input through stdin when the input is one file (use filename
-
)
- Piping output through stdout when the output is one file (use filename
-
)
PDFium & Wazero
This project uses the PDFium C++ library by Google (https://pdfium.googlesource.com/pdfium/) to process the PDF
documents.
We use a Webassembly version of PDFium that is compiled with Emscripten and runs in the Wazero Go runtime.
Getting started
From binary
Download the binary from the latest release for your platform and save it as pdfium
.
You can also use the install
tool for this:
sudo install pdfium-webassembly-linux-x64 /usr/local/bin/pdfium
Release types
The following release types are available:
- Linux
- WebAssembly (amd64 + arm64)
- Native (amd64)
- Native + MUSL (amd64)
- MacOS
- WebAssembly (amd64 + arm64)
- Native (amd64 + arm64)
- Windows
- WebAssembly (amd64)
- Native (amd64)
WebAssembly: this is a single binary that includes everything that you need to run pdfium-cli, but is a lot slower
than native due to the WebAssembly runtime. Most useful if speed is not a concern and easy distribution is more
important.
Native: A native build that requires pdfium and
libjpeg-turbo to be available on your system.
Native + MUSL: Same as native but built with MUSL so that it does not require a system libc which allows it to be
used in Alpine Docker containers.
From source
Make sure you have a working Go development environment.
Clone the repository:
git clone https://github.com/klippa-app/pdfium-cli.git
Move into the directory:
cd pdfium-cli
Run the command:
go run main.go
Or to compile and run pdfium-cli:
go build -o pdfium main.go
./pdfium -h
Output:
pdfium-cli is a CLI tool that allows you to use pdfium from the CLI
Usage:
pdfium [command]
Available Commands:
attachments Extract the attachments of a PDF
completion Generate the autocompletion script for the specified shell
explode Explode a PDF into multiple PDFs
help Help about any command
images Extract the images of a PDF
info Get the information of a PDF
javascripts Extract the javascripts of a PDF
merge Merge multiple PDFs into a single PDF
render Render a PDF into images
text Get the text of a PDF
thumbnails Extract the attachments of a PDF
Flags:
-h, --help help for pdfium
Use "pdfium [command] --help" for more information about a command.
The following build tags are available to control different build types:
- pdfium_cli_use_cgo: whether to compile the native CGO version (faster, but requires pdfium to be installed).
- pdfium_experimental: whether to enable experimental features of pdfium in the build.
- pdfium_use_turbojpeg: whether to enable libjpeg-turbo support, which speeds up jpeg compression a lot compared to the default jpeg encoding in Go.
About Klippa
Founded in 2015, Klippa's goal is to digitize & automate administrative processes with
modern technologies. We help clients enhance the effectiveness of their organization by using machine learning and OCR.
Since 2015, more than a thousand happy clients have used Klippa's software solutions. Klippa currently has an
international team of 50 people, with offices in Groningen, Amsterdam and Brasov.
License
The MIT License (MIT)
Wazero and PDFium come with the Apache License 2.0
license