Documentation
¶
Overview ¶
Package pdftotext is a wrapper for Xpdf command line tool `pdftotext`.
What is `pdftotext`?
Pdftotext converts Portable Document Format (PDF) file to plain text.
Index ¶
- func WithByteOrderMarker() option
- func WithCharFixedWidth(width uint64) option
- func WithCustomConfig(path string) option
- func WithCustomPath(path string) option
- func WithEncoding(name string) option
- func WithEndOfLine(kind string) option
- func WithLineFixedSpacing(spacing uint64) option
- func WithMargin(t, r, b, l uint64) option
- func WithMarginBottom(margin uint64) option
- func WithMarginLeft(margin uint64) option
- func WithMarginRight(margin uint64) option
- func WithMarginTop(margin uint64) option
- func WithModeLayout() option
- func WithModeLinePrinter() option
- func WithModeRaw() option
- func WithModeSimple() option
- func WithModeSimple2() option
- func WithModeTable() option
- func WithNoPageBreak() option
- func WithNoTextDiagonal() option
- func WithOwnerPassword(password string) option
- func WithPageFrom(page uint64) option
- func WithPageRange(from, to uint64) option
- func WithPageTo(page uint64) option
- func WithTextClipping() option
- func WithUserPassword(password string) option
- type Command
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func WithByteOrderMarker ¶
func WithByteOrderMarker() option
Insert a Unicode byte order marker (BOM) at the start of the text output.
func WithCharFixedWidth ¶
func WithCharFixedWidth(width uint64) option
Specify the character pitch (width), in points.
Works only with `WithModeLayout`, `WithModeTable` and `WithModeLinePrinter`.
func WithCustomConfig ¶
func WithCustomConfig(path string) option
Read config-file in place of ~/.xpdfrc or the system-wide config file.
func WithCustomPath ¶
func WithCustomPath(path string) option
Set custom location for `pdftotext` executable.
func WithEncoding ¶
func WithEncoding(name string) option
Sets the encoding to use for text output.
The name must be defined with the unicodeMap command (see xpdfrc(5)). The encoding name is case-sensitive. This defaults to "Latin1".
Available options: `pdftotext -listencodings`.
func WithEndOfLine ¶
func WithEndOfLine(kind string) option
Sets the end-of-line convention to use for text output.
Available options: "unix", "dos", "mac".
func WithLineFixedSpacing ¶
func WithLineFixedSpacing(spacing uint64) option
Specify the line spacing, in points.
Works only with `WithModeLinePrinter`.
func WithMarginBottom ¶
func WithMarginBottom(margin uint64) option
Specifies the bottom margin, in points.
Text in the bottom margin (i.e., within that many points of the bottom edge of the page) is discarded.
func WithMarginLeft ¶
func WithMarginLeft(margin uint64) option
Specifies the left margin, in points.
Text in the left margin (i.e., within that many points of the left edge of the page) is discarded.
func WithMarginRight ¶
func WithMarginRight(margin uint64) option
Specifies the right margin, in points.
Text in the right margin (i.e., within that many points of the right edge of the page) is discarded.
func WithMarginTop ¶
func WithMarginTop(margin uint64) option
Specifies the top margin, in points.
Text in the top margin (i.e., within that many points of the top edge of the page) is discarded.
func WithModeLayout ¶
func WithModeLayout() option
Maintain (as best as possible) the original physical layout of the text.
func WithModeLinePrinter ¶
func WithModeLinePrinter() option
Line printer mode uses a strict fixed-character-pitch and -height layout. The page is broken into a grid, and characters are placed into that grid.
If the grid spacing is too small for the actual characters, the result is extra whitespace. If the grid spacing is too large, the result is missing whitespace.
Use `WithCharFixedWidth` and `WithLineFixedSpacing` to specify grid spacing. If one or both are not given on the command line, it will attempt to compute appropriate value(s).
func WithModeRaw ¶
func WithModeRaw() option
Keep the text in content stream order.
Depending on how the PDF file was generated, this may or may not be useful.
func WithModeSimple ¶
func WithModeSimple() option
Similar to `WithModeLayout`, but optimized for simple one-column pages.
This mode will do a better job of maintaining horizontal spacing, but it will only work properly with a single column of text.
func WithModeSimple2 ¶
func WithModeSimple2() option
Similar to `WithModeSimple` but handles slightly rotated text better.
Only works for pages with a single column of text.
func WithModeTable ¶
func WithModeTable() option
Table mode is similar to physical layout mode, but optimized for tabular data, with the goal of keeping rows and columns aligned (at the expense of inserting extra whitespace).
If the `WithCharFixedWidth` option is given, character spacing within each line will be determined by the specified character pitch.
func WithNoPageBreak ¶
func WithNoPageBreak() option
Don’t insert a page breaks (form feed character) at the end of each page.
func WithNoTextDiagonal ¶
func WithNoTextDiagonal() option
Diagonal text, i.e., text that is not close to one of the 0, 90, 180, or 270 degree axes, is discarded.
This is useful to skip watermarks drawn on top of body text, etc.
func WithOwnerPassword ¶
func WithOwnerPassword(password string) option
Specify the owner password for the PDF file.
Providing this will bypass all security restrictions.
func WithPageRange ¶
func WithPageRange(from, to uint64) option
Specifies the range of pages to convert.
func WithTextClipping ¶
func WithTextClipping() option
Text which is hidden because of clipping is removed before doing layout, and then added back in.
This can be helpful for tables where clipped (invisible) text would overlap the next column.
func WithUserPassword ¶
func WithUserPassword(password string) option
Specify the user password for the PDF file.