gobig2

package module

v0.0.0-...-51e3905 Latest Latest Go to latest Published: May 13, 2026 License: Apache-2.0 Imports: 14 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/dkrisman/gobig2

Links

Open Source Insights

README ¶

gobig2

Pure-Go decoder for ITU-T T.88 / ISO/IEC 14492 JBIG2 streams.

gobig2 was built first for PDF readers - the /JBIG2Decode filter is the dominant use of JBIG2 in the wild - and then for general-purpose standalone .jb2 / .jbig2 decoding. No cgo, no third-party runtime dependencies; every length, count, and dimension that derives from input bytes is gated against a configurable Limits cap before allocation, so the codec is safe to feed adversarial bytes from a PDF crawler or similar untrusted source.

[!WARNING] Pre-1.0. The public API is settling but not yet frozen; the module version is "0.0.0-dev" until the first tagged release. Conformance status against the ITU-T T.88 Annex A corpus is documented in docs/design/ITU-SPEC-PROBLEMS.md - the short version is that TT1, TT9, TT10 decode and TT2-TT8 fail for reasons shared with every other open-source JBIG2 decoder (the corpus encoder ships spec-deviating shapes no production encoder emits).

Performance

Cross-decoder wall-clock benchmark on ubuntu-24.04 (GitHub-hosted runner), best-of-7 with one warm-up, decoded straight to PBM so the cell tracks decode work rather than encoder overhead. Numbers in milliseconds; the per-push run lives at .github/workflows/perf-linux.yml.

fixture	gobig2	jbig2dec	mutool	pdfimages
`bitmap`	2.71	1.78	4.19	10.08
`bitmap-mmr`	2.26	1.28	3.58	8.53
`bitmap-halftone`	2.53	1.65	3.73	8.81
`bitmap-symbol`	1.94	1.29	3.67	8.65
`bitmap-symbol-symhuff-texthuff`	2.05	2.12	4.43	8.86
`perf-text-generic` (33 Mpx)	231.81	175.12	252.60	313.08
`perf-text-symbol` (33 Mpx)	25.94	18.31	95.38	57.33

Pure Go, no asm, no cgo - within 1.3-1.6x of jbig2dec (C, hand-tuned reference) across every fixture.
Beats jbig2dec on bitmap-symbol-symhuff-texthuff and beats every PDF-toolchain decoder on every fixture.

Reproduce locally (Linux / macOS, needs jbig2enc + imagemagick for fixture synthesis):

task bench:corpus            # generate the perf-text-* fixtures under ./tmp/perf-corpus/
task bench:cross CORPUS_DIR=./tmp/perf-corpus

Install

go get github.com/dkrisman/gobig2

Go 1.25 toolchain or newer (see go.mod).

Quickstart - PDF-embedded stream

The canonical flow a PDF reader uses: pull the JBIG2Decode-filtered image XObject stream and the optional /JBIG2Globals parameter object out of the PDF, hand both to NewDecoderEmbedded, and decode the page bitmap.

package main

import (
    "bytes"
    "fmt"
    "image"
    "os"

    "github.com/dkrisman/gobig2"
)

func main() {
    imageStream, err := os.ReadFile("page.jb2")
    if err != nil {
        panic(err)
    }
    var globalsBytes []byte // pulled from /JBIG2Globals if present, else nil

    dec, err := gobig2.NewDecoderEmbedded(bytes.NewReader(imageStream), globalsBytes)
    if err != nil {
        // Adversarial / non-JBIG2 input is rejected up front,
        // before any allocation derived from declared dimensions.
        fmt.Println("decode error:", err)
        return
    }

    img, err := dec.Decode()
    if err != nil {
        fmt.Println("decode error:", err)
        return
    }

    // *image.Gray with ink as 0 (black), paper as 255 (white).
    g := img.(*image.Gray)
    fmt.Printf("decoded %dx%d\n", g.Bounds().Dx(), g.Bounds().Dy())
}

See example_pdf_test.go for the same flow as a runnable Example.

Public API

Constructors

JBIG2 has two on-the-wire forms - pick the constructor that matches your input:

Input shape	Constructor
Standalone `.jb2` / `.jbig2` with T.88 Annex E file header	`NewDecoder`
PDF-embedded segment stream (header stripped by `/JBIG2Decode`)	`NewDecoderEmbedded`
Either shape, with optional external globals	`NewDecoderWithGlobals`

The standalone constructor auto-registers with image.Decode under the format name "jbig2".

Decode methods

Method	Returns	Use when
`Decoder.Decode`	`image.Image` (`*image.Gray`)	You want a stdlib image you can `png.Encode`
`Decoder.DecodePacked`	`PackedPage`	You consume bilevel data directly - PBM writers, 1-bpp PNG, bit-blit. Saves ~12 ms wall + ~35 MB alloc on a 600 dpi A4 page over the `image.Gray` conversion.
`Decoder.DecodeContext` / `Decoder.DecodePackedContext`	same, plus `context.Context`	You need cancellation / a wall-clock budget

PackedPage.Data aliases the decoder's internal buffer until the next call on the same Decoder; copy it if you need it to outlive that boundary.

Resource budgets

JBIG2 is a denial-of-service vector - a 100-byte segment header can declare a 30 GiB region. Every attacker-controlled allocation is gated by a cap on the Limits struct (image pixels, symbols per dict, halftone grid cells, IAID code length, refinement aggregates, per-symbol pixels, etc.). Always start from DefaultLimits and override the fields you want; a bare struct literal silently disables every other cap because zero means "no cap".

limits := gobig2.DefaultLimits()
limits.MaxImagePixels = 100 * 1024 * 1024
limits.Apply()

Apply is process-wide and not safe to call concurrently with active decodes; configure once at startup, then spawn workers. Pair with a wall-clock budget via Decoder.DecodeContext - the segment-parser loop checks ctx.Err() between segments.

Error classification

Every decode failure wraps one of three sentinels:

Sentinel	Meaning	Caller action
`ErrMalformed`	Input bytes are not legal JBIG2	Skip the image
`ErrResourceBudget`	A configured `Limits` cap fired	Raise the cap or accept the rejection
`ErrUnsupported`	Legal but uses an unimplemented feature	Fall back to another decoder

Decoder.Decode returns io.EOF after the final page on multi-page input; cancellation paths wrap context.Canceled / context.DeadlineExceeded. See errors.go for the recommended switch idiom.

CLI tools

Binaries under cmd/:

cmd/gobig2 - decode a standalone or PDF-embedded JBIG2 stream into a PNG, PBM, or raw bitmap. Flags and exit codes documented in the binary's package doc.
cmd/extract-jbig2 - walk a PDF and dump every /JBIG2Decode image XObject (and any /JBIG2Globals stream) as separate .jb2 files, suitable as gobig2 test fixtures.
cmd/perf-cross - dev / CI tool that drives the cross-decoder benchmark table at the top of this README. Wrapped by task bench:cross.

Build with:

task build              # ./... compile-check
task build:release      # stripped, PGO-optimized binaries in ./bin/

Development

The canonical command runner is Taskfile.yml. Common targets:

task test               # all tests
task test:race          # race detector (requires CGO)
task test:conformance   # SerenityOS corpus + ITU-T T.88 Annex A if JBIG2_CONFORMANCE_DIR set
task lint               # golangci-lint v2
task fuzz               # 3s smoke fuzz across every Fuzz* target
task fuzz:long          # 10m sustained fuzz
task bench              # in-process micro-benchmarks
task bench:cross        # cross-decoder wall-clock bench
task ci                 # full gate: fmt:check + check + test:race

The internal/gobig2test package owns the public-API contract tests, conformance corpus harness, fuzz targets, and pathological input regressions.

Repository layout

jbig2.go, errors.go, limits.go - public API surface.
internal/ - decoder packages, one per JBIG2 spec area (see QUICKSTART.md for the code map).
cmd/ - CLI binaries.
docs/ - project docs and design notes.
testdata/ - SerenityOS conformance fixtures, PDF-embedded samples, perf corpora, fuzz seeds.

License

Apache 2.0. See NOTICE for attribution.

Documentation ¶

Overview ¶

Package gobig2 decodes ITU-T T.88 / ISO/IEC 14492 JBIG2 streams.

Built first for PDF readers (JBIG2Decode filter is dominant in the wild), then general-purpose JBIG2 decode.

Three entry points, one per input shape ¶

JBIG2 has two on-the-wire forms. Standalone .jb2 / .jbig2 start with 8-byte magic + flags byte (and optional 4-byte page-count); embedded streams - PDF /JBIG2Decode shape (PDF §7.4.7) - start directly at first segment header. PDF /JBIG2Decode also strips end-of-page (segment type 49) and end-of-file (type 51) markers a standalone file carries.

Pick constructor matching input:

NewDecoder - standalone file. Probes magic, locks onto right organization mode. Auto-registered with image.Decode under format name "jbig2".
NewDecoderEmbedded - PDF-embedded segment stream. Skips header probing. Pass nil globals when stream is self-contained; pass decoded JBIG2Globals bytes when image dict references external context.
NewDecoderWithGlobals - auto-detect with globals fallback. With file header: behaves like NewDecoder, globals supplement. Header missing + globals non-empty: falls back to embedded mode. Use when input could be either shape (e.g. PDF reader hitting stream still wrapped in standalone form).

PDF integration ¶

A PDF reader walking image XObjects gets two byte streams per /JBIG2Decode-filtered image:

Image stream itself - segment-stream form.
Optional /JBIG2Globals via /DecodeParms - separate JBIG2 segment stream holding symbol-dict contexts shared across document.

Both bytes in hand:

dec, err := gobig2.NewDecoderEmbedded(imageStream, globalsBytes)
if err != nil {
    // adversarial / non-JBIG2 input is rejected up front
    return nil, err
}
img, err := dec.Decode()

Returned image.Image is *image.Gray; ink = 0 (black), paper = 255 (white). Dimensions match page-information segment inside stream - will not necessarily equal PDF image dict's /Width and /Height (PDF readers should trust JBIG2 stream's own dimensions, let PDF /Width and /Height drive CTM scaling around them).

Resource budgets ¶

JBIG2 is a DoS vector: attacker declares 30 GB region in 100-byte segment header, naive decoder OOMs. Every length / count / dimension derived from input bytes gated against configurable cap before allocation. Nine caps on Limits; always start from DefaultLimits and tweak fields you want:

limits := gobig2.DefaultLimits()
limits.MaxImagePixels = 100 * 1024 * 1024 // tighten one cap
limits.Apply()

Bare literal `gobig2.Limits{MaxImagePixels: 1<<20}.Apply()` silently disables every other cap (zero = "no cap") - see Limits.Apply footgun. Apply is process-wide, not safe concurrent with active decodes - configure once at startup, then spawn workers. Concurrent Decoder instances calling Decode / DecodeContext on independent inputs safe (each owns own Document; Limits read-only post-Apply). See Limits doc for per-field reference.

Pair with wall-clock budget on call site. Decoder honors cancellation via Decoder.DecodeContext - internal segment-parser loop checks ctx.Err() between segments, aborts as failure on cancel. Cancellation latency bounded by cost of one segment, itself gated by per-region Limits above.

Multi-page streams ¶

Standalone .jb2 can declare multiple pages; PDF streams always one page (page = image XObject). Decoder.Decode returns one page at a time, io.EOF when no more pages. Decoder.DecodeAll is the convenience wrapper.

DecodeConfig reports first page's dimensions of a standalone .jb2 stream - function image.DecodeConfig dispatches to, requires T.88 Annex E file header. PDF-embedded streams (/JBIG2Decode filter bytes) omit header and fail DecodeConfig with ErrMalformed; PDF readers wanting page-info from /JBIG2Decode must build Decoder via NewDecoderEmbedded or NewDecoderEmbeddedWithGlobals and walk to first page-info segment themselves (typically via Decoder.GetDocument.[Document.PageInfoList] after first Decoder.DecodeContext call).

Error handling ¶

All public entry points return error on malformed/truncated input; codec does not panic, even on adversarial bytes. Errors carry context (segment number, parser stage) for triage with --inspect.

Errors wrap one of three sentinels for errors.Is: ErrMalformed (input not legal JBIG2), ErrResourceBudget (configured Limits cap fired), ErrUnsupported (legal but uses unimplemented feature). Cancellation paths additionally wrap context.Canceled / context.DeadlineExceeded. See errors.go for switch idiom.

Index ¶

Constants
Variables
func Decode(r io.Reader) (image.Image, error)
func DecodeAll(r io.Reader) ([]image.Image, error)
func DecodeAllContext(ctx context.Context, r io.Reader) ([]image.Image, error)
func DecodeConfig(r io.Reader) (image.Config, error)
func DecodeConfigContext(ctx context.Context, r io.Reader) (image.Config, error)
func DecodeContext(ctx context.Context, r io.Reader) (image.Image, error)
type Decoder
type Document
type Limits
- func DefaultLimits() Limits
- func (l Limits) Apply()
type PackedPage
type ParsedGlobals
- func ParseGlobals(globals []byte) (*ParsedGlobals, error)
type Result

Examples ¶

NewDecoderEmbedded (Pdf)

Constants ¶

View Source

const (
	// ResultSuccess: a segment parsed successfully; decode loop
	// should call DecodeSequential again to advance.
	ResultSuccess = segment.ResultSuccess
	// ResultFailure: a parse or resource-budget failure; the
	// concrete error is on [Document.Err].
	ResultFailure = segment.ResultFailure
	// ResultEndReached: input is exhausted with no more segments
	// to parse. After this, the next Decode call returns io.EOF.
	ResultEndReached = segment.ResultEndReached
	// ResultPageCompleted: a page-info segment closed the
	// current page; the page bitmap is ready on
	// [Document.Page].
	ResultPageCompleted = segment.ResultPageCompleted
)

Result codes from [Document.DecodeSequential]. Callers walking a Document directly switch on these; the Decoder wrappers handle them internally and surface the appropriate image.Image / error.

View Source

const MaxInputBytes = input.MaxBytes

MaxInputBytes is constructor-level hard cap on physical bytes a single JBIG2 input can occupy. Larger inputs rejected up front by public constructors with error wrapping ErrResourceBudget, before any bitmap allocation. 256 MiB far above legit JBIG2 (600-DPI A4 fax page typically 100 KB-10 MB on wire); real per-region budgets via Limits. CLI tools or framing readers (PDF, fax) that pre-slurp input should apply same cap.

View Source

const Version = "0.0.0-dev"

Version is the gobig2 module version. Bumped at release time alongside the matching git tag. Runtime read is the supported way to feature-detect across versions; pre-1.0 value is "0.0.0-dev" (repo ships no tagged releases yet).

Variables ¶

View Source

var (
	// ErrMalformed wraps every parser-side failure where the
	// input bytes do not conform to JBIG2 (truncation, bad
	// segment header, out-of-bounds segment reference, etc.).
	ErrMalformed = errs.ErrMalformed

	// ErrResourceBudget wraps every failure caused by a
	// configured [Limits] cap firing. The wrapped error names
	// the specific cap.
	ErrResourceBudget = errs.ErrResourceBudget

	// ErrUnsupported wraps failures where the input is legal
	// JBIG2 but uses a feature gobig2 does not implement.
	ErrUnsupported = errs.ErrUnsupported
)

Sentinel errors for `errors.Is` classification. Every decode/parse failure from public Decoder API wraps one of these (or `context.Canceled` / `context.DeadlineExceeded` for cancellation). Categories partition by caller action: ErrMalformed = input not legal JBIG2 (skip image); ErrResourceBudget = configured Limits cap fired (raise cap or accept rejection); ErrUnsupported = legal input, gobig2 path unimplemented (fall back to another decoder if able).

Scope. Sentinel-wrap covers errors decoder produces during segment parsing and bitmap allocation. Errors before parser sees input - chiefly `io.Reader` failures constructors surface from source (dropped network, EIO file, etc.) - return as-is with source type. Treat app I/O failures separately from decode classification; "unwrapped error" branch below does not imply gobig2 bug if source io.Reader is fallible.

Typical PDF-reader pattern:

img, err := dec.DecodeContext(ctx)
switch {
case errors.Is(err, io.EOF):
    // no more pages - multi-page Decode reached the end
case errors.Is(err, context.DeadlineExceeded):
    // budget exhausted - caller policy
case errors.Is(err, gobig2.ErrResourceBudget):
    // input declared a region past Limits - skip / raise cap
case errors.Is(err, gobig2.ErrMalformed):
    // bad JBIG2 - skip image
case errors.Is(err, gobig2.ErrUnsupported):
    // valid but unimplemented variant - fall back if possible
case err != nil:
    // unwrapped error - application-side I/O failure
    // (or, less likely, a gobig2 bug). Inspect with the
    // application's own io.Reader / source-specific
    // matchers first.
}

Decoder.Decode / Decoder.DecodeContext return io.EOF after final page on multi-page input; idiomatic end-of-stream signal, not wrapped by any gobig2 sentinel.

View Source

var NewDocument = segment.NewDocument

NewDocument creates a Document. Used internally by every constructor; exported for rare callers building a Document directly (most go through NewDecoder / NewDecoderEmbedded).

Stability: package-level var bound to segment.NewDocument, not function declaration. Indirection is implementation detail of how gobig2 re-exports internal/segment types; do not reassign - runtime swap to different constructor is not supported extension point, breaks surprisingly (decode-loop callers dispatch through it). Future major release will likely replace with real function wrapper; treat as call-only.

Functions ¶

func Decode ¶

func Decode(r io.Reader) (image.Image, error)

Decode decodes the first page in the JBIG2 data.

Equivalent to DecodeContext with context.Background.

func DecodeAll ¶

func DecodeAll(r io.Reader) ([]image.Image, error)

DecodeAll decodes every remaining page in JBIG2 data, returns in order. Partial results kept on failure.

Equivalent to DecodeAllContext with context.Background.

func DecodeAllContext ¶

func DecodeAllContext(ctx context.Context, r io.Reader) ([]image.Image, error)

DecodeAllContext decodes every remaining page, honoring ctx for cancellation. nil ctx treated as context.Background. Partial results kept on cancel; error wraps `ctx.Err()` on cancel.

Convenience wrapper: `NewDecoder(r)` then `Decoder.DecodeAllContext(ctx)`.

func DecodeConfig ¶

func DecodeConfig(r io.Reader) (image.Config, error)

DecodeConfig returns the JBIG2 image configuration.

Convenience wrapper around DecodeConfigContext with context.Background. stdlib's `image.RegisterFormat` hook calls this signature; server callers wanting request-scoped cancellation should use DecodeConfigContext directly.

func DecodeConfigContext ¶

func DecodeConfigContext(ctx context.Context, r io.Reader) (image.Config, error)

DecodeConfigContext returns the JBIG2 image configuration, honoring ctx for cancellation between segments.

Standalone streams only. Calls NewDecoder internally, requiring T.88 Annex E file-header magic; PDF-embedded /JBIG2Decode streams omit header and fail with ErrMalformed. PDF readers wanting page-info from embedded stream should build Decoder via NewDecoderEmbedded / NewDecoderEmbeddedWithGlobals and read [Document.PageInfoList] after first Decoder.DecodeContext.

First page only. Returned config = dimensions of first page-info segment seen. Standalone files may declare more pages with different dimensions; consumers needing per-page config iterate decoder and inspect each page's bounds after decoding.

Loop safeguards mirror Decoder.DecodeContext: a [Document.Progress] stall guard so adversarial input looping DecodeSequential without forward motion can't hang probe. Resource-budget rejections during probe preserve ErrResourceBudget classification, not collapse to generic ErrMalformed wrap.

Scope of ctx. Like package-level DecodeContext, supplied ctx bounds segment parsing after NewDecoder has read input from r - bounded io.LimitedReader inside constructor doesn't observe cancellation. For network or slow-reader sources, apply deadlines at io.Reader / request layer too.

func DecodeContext ¶

func DecodeContext(ctx context.Context, r io.Reader) (image.Image, error)

DecodeContext decodes the first page in JBIG2 data, honoring ctx for cancellation. nil ctx treated as context.Background. On cancel, error wraps `ctx.Err()`.

Convenience wrapper: `NewDecoder(r)` then `Decoder.DecodeContext(ctx)`. Use explicit Decoder form when reading Decoder.GetDocument or calling Decoder.DecodeAllContext for multi-page input.

Scope of ctx. Supplied context bounds segment parsing after NewDecoder has read input from r - constructor slurps bytes through bounded io.LimitedReader first, so cancellation NOT observed during initial read. For network or slow-reader sources, apply deadlines at io.Reader / request layer (e.g. http.Request.Context) too, not just at this call.

Types ¶

type Decoder ¶

type Decoder struct {
	// contains filtered or unexported fields
}

Decoder is a JBIG2 decoder bound to one input stream. A single Decoder yields one or more pages via Decoder.Decode / Decoder.DecodeContext / Decoder.DecodePacked / Decoder.DecodePackedContext; Decoder.Reset rebinds the decoder to a fresh stream (only on Decoders built with NewDecoderEmbeddedWithGlobals, to reuse parsed globals).

Decoder is NOT safe for concurrent decode calls on the same instance; spawn one Decoder per worker. Resource budgets are process-wide and come from Limits.Apply - configure once at startup, before any Decoder runs.

func NewDecoder ¶

func NewDecoder(r io.Reader) (*Decoder, error)

NewDecoder creates a decoder.

SWF / Flash CWS container scanning intentionally out of scope: JBIG2 codec should not drag compress/zlib into import graph for payload shape no fixture exercises. Callers needing SWF-wrapped JBIG2 should strip SWF container in own layer, feed inner stream to NewDecoder / NewDecoderEmbedded directly.

func NewDecoderEmbedded ¶

func NewDecoderEmbedded(r io.Reader, globals []byte) (*Decoder, error)

NewDecoderEmbedded creates a decoder for JBIG2 stream with no file header - "embedded" stream starting directly at first segment header. PDF /JBIG2Decode delivers this shape (PDF §7.4.7: "the file header, end-of-page segment, and end-of-file segment shall not be present"). Pass empty globals for self-contained streams, or decoded /JBIG2Globals bytes when referencing external symbol-dict context.

Auto-detect path in NewDecoder / NewDecoderWithGlobals needs 8-byte JBIG2 magic; PDF strips it, probing fails. NewDecoderEmbedded skips probing, sets embedded-mode params directly: sequential organization, no random access, big-endian byte order with small little-endian heuristic on first 4 bytes (matches NewDecoderWithGlobals fallback when probing fails AND globals non-empty).

Cheap plausibility sniff on first segment header before document loop - random ASCII or non-JBIG2 input otherwise drives segment parser into long stalls (spec requires no specific byte at offset 0, so parser has nothing to short-circuit on). Up-front reject = clean error path instead of hang.

Example (Pdf) ¶

ExampleNewDecoderEmbedded_pdf shows the canonical PDF-reader flow: pull JBIG2Decode-filtered image stream plus optional /JBIG2Globals from PDF, hand both to NewDecoderEmbedded, Decode page bitmap.

Fixture testdata/pdf-embedded/sample.jb2 extracted from real PDF. 94 bytes decode to 3562x851 fully-black bitmap.

package main

import (
	"bytes"
	"fmt"
	"image"
	"os"

	gobig2 "github.com/dkrisman/gobig2"
)

func main() {
	// Real PDF reader: imageStream = bytes between `stream\n`
	// and `\nendstream` of Image XObject with /Filter
	// /JBIG2Decode. globalsBytes = /JBIG2Globals from
	// /DecodeParms; nil when image dict has no reference.
	imageStream, err := os.ReadFile("testdata/pdf-embedded/sample.jb2")
	if err != nil {
		fmt.Println(err)
		return
	}
	var globalsBytes []byte // pulled from /JBIG2Globals if present

	dec, err := gobig2.NewDecoderEmbedded(bytes.NewReader(imageStream), globalsBytes)
	if err != nil {
		// Adversarial / non-JBIG2 rejected up front, before any
		// allocation from declared dimensions. Surface to PDF
		// reader's per-image error path; do not panic.
		fmt.Println("decode error:", err)
		return
	}

	img, err := dec.Decode()
	if err != nil {
		fmt.Println("decode error:", err)
		return
	}

	// img is *image.Gray; ink = 0 (black), paper = 255 (white).
	// PDF /Width and /Height should match JBIG2 page-info
	// dimensions; if not, trust JBIG2 stream and scale via CTM.
	g := img.(*image.Gray)
	fmt.Printf("decoded %dx%d\n", g.Bounds().Dx(), g.Bounds().Dy())
}

Output:
decoded 3562x851

func NewDecoderEmbeddedWithGlobals ¶

func NewDecoderEmbeddedWithGlobals(r io.Reader, globals *ParsedGlobals) (*Decoder, error)

NewDecoderEmbeddedWithGlobals creates a Decoder for a PDF-embedded JBIG2 stream sharing pre-parsed globals. Equivalent to NewDecoderEmbedded but skips per-decode globals re-parse - useful when same /JBIG2Globals referenced from many image XObjects.

Pass nil globals (or no-op ParseGlobals(nil)) for self-contained streams. Either form produces a Decoder supporting Decoder.Reset - constructor stamps no-op handle so resettable property holds regardless of whether caller had globals to bind.

func NewDecoderWithGlobals ¶

func NewDecoderWithGlobals(r io.Reader, globals []byte) (*Decoder, error)

NewDecoderWithGlobals creates a decoder using an external globals stream.

For PDF-shaped flow where input may carry JBIG2 file header (probe.Configs locks on) or be raw embedded segment stream (fall back to embedded-mode params, rely on globals for referenced symbol dicts). SWF / Flash container scanning out of scope; see NewDecoder rationale.

func (*Decoder) Decode ¶

func (d *Decoder) Decode() (image.Image, error)

Decode decodes the next page.

Equivalent to Decoder.DecodeContext with context.Background. Use DecodeContext when canceling a long-running decode (e.g. via context.WithTimeout).

Internal loop tracks progress via [Document.Progress] (stream-cursor advances OR grouped-mode index advances) so adversarial input driving [Document.DecodeSequential] into non-terminal state without forward motion can't hang decoder. Second consecutive call not advancing progress token aborts as malformed - same defensive bound global-segments loop applies in [drainGlobals].

func (*Decoder) DecodeAll ¶

func (d *Decoder) DecodeAll() ([]image.Image, error)

DecodeAll decodes all remaining pages.

Equivalent to Decoder.DecodeAllContext with context.Background.

func (*Decoder) DecodeAllContext ¶

func (d *Decoder) DecodeAllContext(ctx context.Context) ([]image.Image, error)

DecodeAllContext decodes all remaining pages, honoring ctx for cancellation. nil ctx treated as context.Background. Partial results kept on cancel; returned error is first failure encountered, wrapping `ctx.Err()` on cancel. Check via `errors.Is(err, context.Canceled)` or `errors.Is(err, context.DeadlineExceeded)`.

Context consulted between every segment of every page; see Decoder.DecodeContext for per-page cancellation contract.

Memory note: each decoded page appended to returned slice, held in caller memory until slice out of scope. Each entry 8-bpp `*image.Gray` (one byte/pixel), so 100-page document of 8-megapixel pages keeps ~800 MiB alive simultaneously. Internal packed page bitmap released after each page (see Decoder.DecodeContext), so gray slice is only per-page retention but unbounded by Limits. Prefer Decoder.DecodeContext in loop, processing/writing each page before requesting next, when document size is large or attacker-controlled.

func (*Decoder) DecodeContext ¶

func (d *Decoder) DecodeContext(ctx context.Context) (image.Image, error)

DecodeContext decodes the next page, honoring ctx for cancellation. nil ctx treated as context.Background. On cancel, error wraps `ctx.Err()`; check via `errors.Is(err, context.Canceled)` or `errors.Is(err, context.DeadlineExceeded)`.

Cancellation checked between segments inside segment-parser loop, so latency bounded by cost of one segment (itself capped by per-region Limits).

Peak-memory note: on success, packed 1-bpp page bitmap converted to dense `*image.Gray` (one byte/pixel) before packed released. Short window both representations live; gray copy ~8x packed footprint. 256-megapixel page (default `MaxImagePixels`) = ~32 MiB packed + ~256 MB gray = peak ~288 MiB during conversion, driven almost entirely by gray output. Plan `runtime/debug.SetMemoryLimit` or per-call wall-clock budgets around peak, not steady-state packed.

func (*Decoder) DecodePacked ¶

func (d *Decoder) DecodePacked() (PackedPage, error)

DecodePacked is Decoder.Decode for bilevel-aware consumers. Returns the packed internal bitmap directly via PackedPage, skipping the *image.Gray conversion that Decoder.Decode performs. On a 600 dpi A4 page that saves ~12 ms wall + ~35 MB allocation; downstream PBM / 1-bpp-PNG writers want the packed form anyway.

Equivalent to Decoder.DecodePackedContext with context.Background.

func (*Decoder) DecodePackedContext ¶

func (d *Decoder) DecodePackedContext(ctx context.Context) (PackedPage, error)

DecodePackedContext is Decoder.DecodeContext for bilevel- aware consumers - same cancellation contract, returns a PackedPage instead of an *image.Gray. See PackedPage for byte layout and aliasing lifetime.

func (*Decoder) GetDocument ¶

func (d *Decoder) GetDocument() *Document

GetDocument returns the underlying document.

Stability: advanced / unstable. Returned *Document is internal parse orchestrator Decoder owns. Exposed so bundled `--inspect` tool and low-level callers can walk segment metadata, but method surface is internal/segment and not part of gobig2's public API contract. In particular:

Calling DecodeSequential, SetContext, ReleasePageSegments, or any state-mutating method directly is unsupported while parent Decoder still in use. Mixing those with Decode / DecodeContext yields undefined behavior.
Exposed type, fields, and methods may change between versions without deprecation cycle.

Treat result as read-only inspection handle. Use for segment-table dumps and similar tooling; route every decode through Decoder.

func (*Decoder) Reset ¶

func (d *Decoder) Reset(r io.Reader) error

Reset reinitializes Decoder for a new PDF-embedded JBIG2 stream while keeping previously bound ParsedGlobals attached. Hot path for PDF reader iterating over image XObjects sharing a /JBIG2Globals: parse once, build one Decoder with NewDecoderEmbeddedWithGlobals, then Reset between images instead of re-allocating fresh Decoder + re-parsing globals.

Reset returns ErrUnsupported when called on Decoder not built via NewDecoderEmbeddedWithGlobals; re-invoke those constructors instead.

Errors wrap ErrMalformed when r yields non-JBIG2 input.

type Document ¶

type Document = segment.Document

Document is the document-parsing context returned by Decoder.GetDocument.

type Limits ¶

type Limits struct {
	// MaxImagePixels caps the total pixel count of any single
	// bitmap NewImage allocates. Default 256 megapixels.
	MaxImagePixels int64
	// MaxSymbolsPerDict caps SDNUMNEWSYMS / SDNUMEXSYMS at parse
	// and also the aggregate input-symbol pool a text region or
	// symbol dict assembles across all referenced symbol-dict
	// segments. Default 1 M.
	MaxSymbolsPerDict uint32
	// MaxPatternsPerDict caps the halftone HDPATS array length.
	// Default 1 M.
	MaxPatternsPerDict uint32
	// MaxHalftoneGridCells caps halftone HGW x HGH grid product.
	// Each cell expands to HPW x HPH pixels, so cell count is
	// order of magnitude below rendered pixel count: 1200-DPI A4
	// (~140 megapixels) with 8x8 patterns = ~2 megacells; even
	// 2x2 stays under ~35 megacells. Default 64 megacells past
	// legit use, bounds worst-case per-cell rendering at ~2 s
	// CPU. Without cap, adversarial HGW/HGH each under per-side
	// state.MaxImageSize can still declare multi-gigacell grid
	// against tiny output region.
	MaxHalftoneGridCells uint64
	// MaxIaidCodeLen caps SBSYMCODELEN before IAID context array
	// allocated. Max practical 30. Cap is `const` in
	// internal/arith; field here for API symmetry but
	// [Limits.Apply] ignores it.
	MaxIaidCodeLen uint8
	// MaxRefaggninst caps REFAGGNINST per aggregate symbol. Real
	// glyphs rarely exceed few dozen per aggregate; default 1024
	// well above legit use.
	MaxRefaggninst uint32
	// MaxSymbolPixels caps SYMWIDTH x HCHEIGHT per symbol bitmap.
	// Real glyphs are tens of pixels/side; default 4 megapixels
	// = two orders beyond legit, well below page-level cap.
	// Adversarial input drives single glyph to multi-megapixel
	// then iterates generic-region template loop per pixel (~10
	// s CPU per 16 megapixel adversarial symbol on dev VM).
	MaxSymbolPixels uint64
	// MaxPixelsPerByte caps ratio of declared page-info
	// `width x height` to total input-byte budget. Default 1 M
	// pixels/byte (~30x headroom over highest-ratio fixture)
	// rejects 152-megapixel from 30-byte adversarial pages
	// while leaving room for tight encodings (bundled
	// testdata/pdf-embedded/sample.jb2 ratio ~32 K).
	MaxPixelsPerByte uint64
	// MaxSymbolDictPixels caps aggregate SYMWIDTH x HCHEIGHT sum
	// across all symbols in one SDD call. Complements
	// MaxSymbolPixels: adversarial dict can declare hundreds of
	// small symbols each passing per-symbol cap but accumulating
	// to hundreds of megapixels of template-loop work. Real
	// text-heavy fixtures top out at few megapixels per dict;
	// default 16 megapixels.
	MaxSymbolDictPixels uint64
	// MaxBytesPerSegment caps DataLength declared in each segment
	// header. Real segments rarely exceed few MB; default 16 MB
	// rejects 4 GB adversarial declarations at parse before any
	// per-segment work. 0xFFFFFFFF "unknown length" streaming
	// sentinel is exempt.
	MaxBytesPerSegment uint64
}

Limits bundles resource caps the codec consults when allocating bitmaps and dictionaries. Each bounds a different attacker-controlled allocation or work site:

MaxImagePixels caps any single bitmap NewImage allocates. 1200-DPI A4 = ~140 megapixels; default 256 megapixels leaves ~10x headroom, blocks pathological dimensions.
MaxSymbolsPerDict caps SDNUMNEWSYMS / SDNUMEXSYMS at parse and aggregate input-symbol pool a text region or symbol dict assembles across referenced symbol-dict segments. Real dicts rarely exceed few thousand (corpus max: 308); 1 M default well above legit, bounds pre-decode pointer- slice allocation.
MaxPatternsPerDict caps the halftone HDPATS array.
MaxHalftoneGridCells caps halftone HGW x HGH grid product. Per-side state.MaxImageSize rejects oversized HGW/HGH; this cap covers when both fit per-side but product drives per-cell rendering into multi-second decode regardless of output region size.
MaxIaidCodeLen caps SBSYMCODELEN before IAID context array allocated. Array sizes 1<<SBSYMCODELEN; cap 30 holds worst case below 16 GiB.
MaxRefaggninst caps REFAGGNINST per aggregate symbol; blocks adversarial inner-refinement decode hangs.
MaxSymbolPixels caps SYMWIDTH x HCHEIGHT per symbol; blocks multi-megapixel glyph driving generic-region template loop into multi-second decode.
MaxSymbolDictPixels caps aggregate SYMWIDTH x HCHEIGHT sum across all symbols in one SDD call; blocks "many small symbols" passing per-symbol cap but accumulating to hundreds of megapixels.
MaxPixelsPerByte caps ratio of declared page-info width x height to total input-byte budget; blocks 30-byte -> 152-megapixel page-info shapes at parse.
MaxBytesPerSegment caps per-segment DataLength; blocks adversarial 4 GB segment-length at parse before any per-segment work.

Zero on any field = "no cap". Callers wanting no limits can use Limits{} but should pair with own wall-clock budget.

Concurrency. Limits.Apply mutates process-wide package vars; not safe across goroutines or concurrent with active decodes. Configure once at startup, then spawn goroutines. Concurrent Decoder instances calling Decode / DecodeContext on independent inputs are safe - each owns its own Document; package Limits read-only after Apply returns.

Tests. Tests mutating caps (via Limits.Apply or direct var writes) must not call `t.Parallel` and must save/restore via deferred reset. Prefer DefaultLimits().Apply snapshots when swapping complete profiles; direct-var fine for one-knob tests touching a single cap.

func DefaultLimits ¶

func DefaultLimits() Limits

DefaultLimits returns the codec's stock caps - values package vars carry before any Limits.Apply call. Starting point for customized Limits.

Stability: every field sourced from compile-time constant in owning internal package, so DefaultLimits() returns same values regardless of prior Limits.Apply call. Per-package var Apply mutates is initialized from same constant.

func (Limits) Apply ¶

func (l Limits) Apply()

Apply writes l into package-level caps internal decoders consult. Process-wide; not safe concurrent with itself or active Decode (package vars read by every decoder, mid-decode mutation could race). Concurrent Decode on independent Decoder instances safe - share read-only Limits, each owns own Document state. Field = 0 disables that cap entirely.

FOOTGUN: callers tweaking one or two caps must start from DefaultLimits not a bare struct literal:

// WRONG - silently disables every cap you didn't list:
gobig2.Limits{MaxImagePixels: 100_000_000}.Apply()

// RIGHT - preserves the documented defaults you didn't change:
limits := gobig2.DefaultLimits()
limits.MaxImagePixels = 100_000_000
limits.Apply()

Bare-literal form only when intentionally disabling everything except listed fields (e.g. fuzz harnesses needing permissive profile).

MaxIaidCodeLen is `const` in internal/arith and cannot be reassigned at runtime - field on Limits for API symmetry but Apply ignores. To loosen IAID cap, rebuild codec with constant changed; bound is hard ceiling on pre-allocation context array, not configurable knob.

type PackedPage ¶

type PackedPage struct {
	// Data is the packed bitmap. Aliases the Decoder's internal
	// page buffer; see PackedPage doc for lifetime.
	Data []byte
	// Width is the page width in pixels.
	Width int
	// Height is the page height in pixels.
	Height int
	// Stride is the byte offset between consecutive rows
	// (== ceil(Width/8)).
	Stride int
}

PackedPage is a 1-bit-per-pixel page bitmap in MSB-first packed bytes. Returned by Decoder.DecodePacked for callers that consume the bilevel data directly - PBM writers, 1-bpp PNG encoders, bit-blit pipelines - without paying for the 8-bpp *image.Gray conversion Decoder.Decode performs (~12 ms + ~35 MB alloc on a 600 dpi A4 page).

Pixel layout: row r starts at Data[r*Stride]. Within a byte, bit 7 (MSB) is the leftmost pixel; ink = 1, paper = 0. Same polarity and packing as PBM (P4) and as the bytes a /JBIG2Decode filter delivers.

Data aliases the Decoder's internal page buffer. It stays valid until the next Decode / DecodeContext / DecodePacked / DecodePackedContext / Reset call on the same Decoder; copy it before that boundary if you need it to outlive the Decoder's next operation.

type ParsedGlobals ¶

type ParsedGlobals struct {
	// contains filtered or unexported fields
}

ParsedGlobals holds a pre-parsed JBIG2 globals stream shareable across many NewDecoderEmbeddedWithGlobals calls. Use when single PDF /JBIG2Globals referenced from multiple image XObjects: parse once with ParseGlobals, bind into each image's Decoder.

Read-only after construction. Not safe for concurrent decode - bind from single goroutine processing images sequentially. Concurrent PDF readers should ParseGlobals once per worker, not once per process.

func ParseGlobals ¶

func ParseGlobals(globals []byte) (*ParsedGlobals, error)

ParseGlobals parses a JBIG2 globals stream and returns a reusable handle. Empty / nil slice creates a no-globals handle (equivalent to passing nil to a non-Parsed constructor).

Bytes are typically the decoded /JBIG2Globals stream object referenced from a PDF image XObject's /DecodeParms.

The returned handle retains a reference to the input `globals` slice for the lifetime of the handle. Do not mutate the slice after this call; callers that need to free the source bytes should pass a copy.

Errors wrap ErrResourceBudget when the slice exceeds MaxInputBytes; otherwise parse failures wrap ErrMalformed. Match either sentinel via errors.Is to keep the budget-vs-malformed distinction downstream callers rely on.

type Result ¶

type Result = segment.Result

Result is the document parser's per-step result code, returned by [Document.DecodeSequential]. The public Decoder loop in Decoder.Decode / Decoder.DecodeContext switches on it.

Source Files ¶

View all Source files

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
cmd
extract-jbig2 command Command extract-jbig2 walks a PDF and dumps every /Filter /JBIG2Decode image XObject's stream bytes (plus any referenced /JBIG2Globals) as separate .jb2 files.	Command extract-jbig2 walks a PDF and dumps every /Filter /JBIG2Decode image XObject's stream bytes (plus any referenced /JBIG2Globals) as separate .jb2 files.
gobig2 command Command gobig2 decodes a standalone or PDF-embedded JBIG2 stream into a bitmap.	Command gobig2 decodes a standalone or PDF-embedded JBIG2 stream into a bitmap.
perf-cross command Command perf-cross times each installed JBIG2 decoder (gobig2, jbig2dec, mutool, pdfimages, PDFBox) over a fixture set and emits a Markdown comparison table plus the raw JSON measurement matrix.	Command perf-cross times each installed JBIG2 decoder (gobig2, jbig2dec, mutool, pdfimages, PDFBox) over a fixture set and emits a Markdown comparison table plus the raw JSON measurement matrix.
internal
arith Package arith implements JBIG2 MQ arithmetic coder and the integer / IAID adapters above it.	Package arith implements JBIG2 MQ arithmetic coder and the integer / IAID adapters above it.
bio Package bio implements bit-level I/O over a byte buffer.	Package bio implements bit-level I/O over a byte buffer.
errs Package errs holds sentinel error values re-exported by gobig2 for `errors.Is` classification.	Package errs holds sentinel error values re-exported by gobig2 for `errors.Is` classification.
generic Package generic implements JBIG2 generic-region decoding (T.88 §6.2).	Package generic implements JBIG2 generic-region decoding (T.88 §6.2).
halftone Package halftone implements halftone region decoding (T.88 §6.6, type-22/23 segments) and the pattern dictionary it indexes (T.88 §6.7, type-16).	Package halftone implements halftone region decoding (T.88 §6.6, type-22/23 segments) and the pattern dictionary it indexes (T.88 §6.7, type-16).
huffman Package huffman implements JBIG2 Huffman tables (T.88 Annex B, B.1-B.15) plus a generic decoder.	Package huffman implements JBIG2 Huffman tables (T.88 Annex B, B.1-B.15) plus a generic decoder.
input
intmath Package intmath holds small integer helpers shared across the decoder.	Package intmath holds small integer helpers shared across the decoder.
mmr Package mmr decodes JBIG2's CCITT Group 4 / T.6 (MMR) bitmap streams.	Package mmr decodes JBIG2's CCITT Group 4 / T.6 (MMR) bitmap streams.
page Package page holds the bi-level Image type that region decoders write into and the document orchestrator stitches to form the final page bitmap.	Package page holds the bi-level Image type that region decoders write into and the document orchestrator stitches to form the final page bitmap.
probe
refinement Package refinement implements JBIG2 generic refinement region decoding (T.88 §6.3).	Package refinement implements JBIG2 generic refinement region decoding (T.88 §6.3).
segment Package segment owns the JBIG2 segment table and the document orchestrator (Document) that walks it.	Package segment owns the JBIG2 segment table and the document orchestrator (Document) that walks it.
state Package state holds cross-cutting enums and constants shared by JBIG2 decoders and the document orchestrator.	Package state holds cross-cutting enums and constants shared by JBIG2 decoders and the document orchestrator.
symbol