Documentation
¶
Overview ¶
Package gobig2 decodes ITU-T T.88 / ISO/IEC 14492 JBIG2 streams.
Built first for PDF readers (JBIG2Decode filter is dominant in the wild), then general-purpose JBIG2 decode.
Three entry points, one per input shape ¶
JBIG2 has two on-the-wire forms. Standalone .jb2 / .jbig2 start with 8-byte magic + flags byte (and optional 4-byte page-count); embedded streams - PDF /JBIG2Decode shape (PDF §7.4.7) - start directly at first segment header. PDF /JBIG2Decode also strips end-of-page (segment type 49) and end-of-file (type 51) markers a standalone file carries.
Pick constructor matching input:
NewDecoder - standalone file. Probes magic, locks onto right organization mode. Auto-registered with image.Decode under format name "jbig2".
NewDecoderEmbedded - PDF-embedded segment stream. Skips header probing. Pass nil globals when stream is self-contained; pass decoded JBIG2Globals bytes when image dict references external context.
NewDecoderWithGlobals - auto-detect with globals fallback. With file header: behaves like NewDecoder, globals supplement. Header missing + globals non-empty: falls back to embedded mode. Use when input could be either shape (e.g. PDF reader hitting stream still wrapped in standalone form).
PDF integration ¶
A PDF reader walking image XObjects gets two byte streams per /JBIG2Decode-filtered image:
- Image stream itself - segment-stream form.
- Optional /JBIG2Globals via /DecodeParms - separate JBIG2 segment stream holding symbol-dict contexts shared across document.
Both bytes in hand:
dec, err := gobig2.NewDecoderEmbedded(imageStream, globalsBytes)
if err != nil {
// adversarial / non-JBIG2 input is rejected up front
return nil, err
}
img, err := dec.Decode()
Returned image.Image is *image.Gray; ink = 0 (black), paper = 255 (white). Dimensions match page-information segment inside stream - will not necessarily equal PDF image dict's /Width and /Height (PDF readers should trust JBIG2 stream's own dimensions, let PDF /Width and /Height drive CTM scaling around them).
Resource budgets ¶
JBIG2 is a DoS vector: attacker declares 30 GB region in 100-byte segment header, naive decoder OOMs. Every length / count / dimension derived from input bytes gated against configurable cap before allocation. Nine caps on Limits; always start from DefaultLimits and tweak fields you want:
limits := gobig2.DefaultLimits() limits.MaxImagePixels = 100 * 1024 * 1024 // tighten one cap limits.Apply()
Bare literal `gobig2.Limits{MaxImagePixels: 1<<20}.Apply()` silently disables every other cap (zero = "no cap") - see Limits.Apply footgun. Apply is process-wide, not safe concurrent with active decodes - configure once at startup, then spawn workers. Concurrent Decoder instances calling Decode / DecodeContext on independent inputs safe (each owns own Document; Limits read-only post-Apply). See Limits doc for per-field reference.
Pair with wall-clock budget on call site. Decoder honors cancellation via Decoder.DecodeContext - internal segment-parser loop checks ctx.Err() between segments, aborts as failure on cancel. Cancellation latency bounded by cost of one segment, itself gated by per-region Limits above.
Multi-page streams ¶
Standalone .jb2 can declare multiple pages; PDF streams always one page (page = image XObject). Decoder.Decode returns one page at a time, io.EOF when no more pages. Decoder.DecodeAll is the convenience wrapper.
DecodeConfig reports first page's dimensions of a standalone .jb2 stream - function image.DecodeConfig dispatches to, requires T.88 Annex E file header. PDF-embedded streams (/JBIG2Decode filter bytes) omit header and fail DecodeConfig with ErrMalformed; PDF readers wanting page-info from /JBIG2Decode must build Decoder via NewDecoderEmbedded or NewDecoderEmbeddedWithGlobals and walk to first page-info segment themselves (typically via Decoder.GetDocument.[Document.PageInfoList] after first Decoder.DecodeContext call).
Error handling ¶
All public entry points return error on malformed/truncated input; codec does not panic, even on adversarial bytes. Errors carry context (segment number, parser stage) for triage with --inspect.
Errors wrap one of three sentinels for errors.Is: ErrMalformed (input not legal JBIG2), ErrResourceBudget (configured Limits cap fired), ErrUnsupported (legal but uses unimplemented feature). Cancellation paths additionally wrap context.Canceled / context.DeadlineExceeded. See errors.go for switch idiom.
Index ¶
- Constants
- Variables
- func Decode(r io.Reader) (image.Image, error)
- func DecodeAll(r io.Reader) ([]image.Image, error)
- func DecodeAllContext(ctx context.Context, r io.Reader) ([]image.Image, error)
- func DecodeConfig(r io.Reader) (image.Config, error)
- func DecodeConfigContext(ctx context.Context, r io.Reader) (image.Config, error)
- func DecodeContext(ctx context.Context, r io.Reader) (image.Image, error)
- type Decoder
- func (d *Decoder) Decode() (image.Image, error)
- func (d *Decoder) DecodeAll() ([]image.Image, error)
- func (d *Decoder) DecodeAllContext(ctx context.Context) ([]image.Image, error)
- func (d *Decoder) DecodeContext(ctx context.Context) (image.Image, error)
- func (d *Decoder) DecodePacked() (PackedPage, error)
- func (d *Decoder) DecodePackedContext(ctx context.Context) (PackedPage, error)
- func (d *Decoder) GetDocument() *Document
- func (d *Decoder) Reset(r io.Reader) error
- type Document
- type Limits
- type PackedPage
- type ParsedGlobals
- type Result
Examples ¶
Constants ¶
const ( // ResultSuccess: a segment parsed successfully; decode loop // should call DecodeSequential again to advance. ResultSuccess = segment.ResultSuccess // ResultFailure: a parse or resource-budget failure; the // concrete error is on [Document.Err]. ResultFailure = segment.ResultFailure // ResultEndReached: input is exhausted with no more segments // to parse. After this, the next Decode call returns io.EOF. ResultEndReached = segment.ResultEndReached // ResultPageCompleted: a page-info segment closed the // current page; the page bitmap is ready on // [Document.Page]. ResultPageCompleted = segment.ResultPageCompleted )
Result codes from [Document.DecodeSequential]. Callers walking a Document directly switch on these; the Decoder wrappers handle them internally and surface the appropriate image.Image / error.
const MaxInputBytes = input.MaxBytes
MaxInputBytes is constructor-level hard cap on physical bytes a single JBIG2 input can occupy. Larger inputs rejected up front by public constructors with error wrapping ErrResourceBudget, before any bitmap allocation. 256 MiB far above legit JBIG2 (600-DPI A4 fax page typically 100 KB-10 MB on wire); real per-region budgets via Limits. CLI tools or framing readers (PDF, fax) that pre-slurp input should apply same cap.
const Version = "0.0.0-dev"
Version is the gobig2 module version. Bumped at release time alongside the matching git tag. Runtime read is the supported way to feature-detect across versions; pre-1.0 value is "0.0.0-dev" (repo ships no tagged releases yet).
Variables ¶
var ( // ErrMalformed wraps every parser-side failure where the // input bytes do not conform to JBIG2 (truncation, bad // segment header, out-of-bounds segment reference, etc.). ErrMalformed = errs.ErrMalformed // ErrResourceBudget wraps every failure caused by a // configured [Limits] cap firing. The wrapped error names // the specific cap. ErrResourceBudget = errs.ErrResourceBudget // ErrUnsupported wraps failures where the input is legal // JBIG2 but uses a feature gobig2 does not implement. ErrUnsupported = errs.ErrUnsupported )
Sentinel errors for `errors.Is` classification. Every decode/parse failure from public Decoder API wraps one of these (or `context.Canceled` / `context.DeadlineExceeded` for cancellation). Categories partition by caller action: ErrMalformed = input not legal JBIG2 (skip image); ErrResourceBudget = configured Limits cap fired (raise cap or accept rejection); ErrUnsupported = legal input, gobig2 path unimplemented (fall back to another decoder if able).
Scope. Sentinel-wrap covers errors decoder produces during segment parsing and bitmap allocation. Errors before parser sees input - chiefly `io.Reader` failures constructors surface from source (dropped network, EIO file, etc.) - return as-is with source type. Treat app I/O failures separately from decode classification; "unwrapped error" branch below does not imply gobig2 bug if source io.Reader is fallible.
Typical PDF-reader pattern:
img, err := dec.DecodeContext(ctx)
switch {
case errors.Is(err, io.EOF):
// no more pages - multi-page Decode reached the end
case errors.Is(err, context.DeadlineExceeded):
// budget exhausted - caller policy
case errors.Is(err, gobig2.ErrResourceBudget):
// input declared a region past Limits - skip / raise cap
case errors.Is(err, gobig2.ErrMalformed):
// bad JBIG2 - skip image
case errors.Is(err, gobig2.ErrUnsupported):
// valid but unimplemented variant - fall back if possible
case err != nil:
// unwrapped error - application-side I/O failure
// (or, less likely, a gobig2 bug). Inspect with the
// application's own io.Reader / source-specific
// matchers first.
}
Decoder.Decode / Decoder.DecodeContext return io.EOF after final page on multi-page input; idiomatic end-of-stream signal, not wrapped by any gobig2 sentinel.
var NewDocument = segment.NewDocument
NewDocument creates a Document. Used internally by every constructor; exported for rare callers building a Document directly (most go through NewDecoder / NewDecoderEmbedded).
Stability: package-level var bound to segment.NewDocument, not function declaration. Indirection is implementation detail of how gobig2 re-exports internal/segment types; do not reassign - runtime swap to different constructor is not supported extension point, breaks surprisingly (decode-loop callers dispatch through it). Future major release will likely replace with real function wrapper; treat as call-only.
Functions ¶
func Decode ¶
Decode decodes the first page in the JBIG2 data.
Equivalent to DecodeContext with context.Background.
func DecodeAll ¶
DecodeAll decodes every remaining page in JBIG2 data, returns in order. Partial results kept on failure.
Equivalent to DecodeAllContext with context.Background.
func DecodeAllContext ¶
DecodeAllContext decodes every remaining page, honoring ctx for cancellation. nil ctx treated as context.Background. Partial results kept on cancel; error wraps `ctx.Err()` on cancel.
Convenience wrapper: `NewDecoder(r)` then `Decoder.DecodeAllContext(ctx)`.
func DecodeConfig ¶
DecodeConfig returns the JBIG2 image configuration.
Convenience wrapper around DecodeConfigContext with context.Background. stdlib's `image.RegisterFormat` hook calls this signature; server callers wanting request-scoped cancellation should use DecodeConfigContext directly.
func DecodeConfigContext ¶
DecodeConfigContext returns the JBIG2 image configuration, honoring ctx for cancellation between segments.
Standalone streams only. Calls NewDecoder internally, requiring T.88 Annex E file-header magic; PDF-embedded /JBIG2Decode streams omit header and fail with ErrMalformed. PDF readers wanting page-info from embedded stream should build Decoder via NewDecoderEmbedded / NewDecoderEmbeddedWithGlobals and read [Document.PageInfoList] after first Decoder.DecodeContext.
First page only. Returned config = dimensions of first page-info segment seen. Standalone files may declare more pages with different dimensions; consumers needing per-page config iterate decoder and inspect each page's bounds after decoding.
Loop safeguards mirror Decoder.DecodeContext: a [Document.Progress] stall guard so adversarial input looping DecodeSequential without forward motion can't hang probe. Resource-budget rejections during probe preserve ErrResourceBudget classification, not collapse to generic ErrMalformed wrap.
Scope of ctx. Like package-level DecodeContext, supplied ctx bounds segment parsing after NewDecoder has read input from r - bounded io.LimitedReader inside constructor doesn't observe cancellation. For network or slow-reader sources, apply deadlines at io.Reader / request layer too.
func DecodeContext ¶
DecodeContext decodes the first page in JBIG2 data, honoring ctx for cancellation. nil ctx treated as context.Background. On cancel, error wraps `ctx.Err()`.
Convenience wrapper: `NewDecoder(r)` then `Decoder.DecodeContext(ctx)`. Use explicit Decoder form when reading Decoder.GetDocument or calling Decoder.DecodeAllContext for multi-page input.
Scope of ctx. Supplied context bounds segment parsing after NewDecoder has read input from r - constructor slurps bytes through bounded io.LimitedReader first, so cancellation NOT observed during initial read. For network or slow-reader sources, apply deadlines at io.Reader / request layer (e.g. http.Request.Context) too, not just at this call.
Types ¶
type Decoder ¶
type Decoder struct {
// contains filtered or unexported fields
}
Decoder is a JBIG2 decoder bound to one input stream. A single Decoder yields one or more pages via Decoder.Decode / Decoder.DecodeContext / Decoder.DecodePacked / Decoder.DecodePackedContext; Decoder.Reset rebinds the decoder to a fresh stream (only on Decoders built with NewDecoderEmbeddedWithGlobals, to reuse parsed globals).
Decoder is NOT safe for concurrent decode calls on the same instance; spawn one Decoder per worker. Resource budgets are process-wide and come from Limits.Apply - configure once at startup, before any Decoder runs.
func NewDecoder ¶
NewDecoder creates a decoder.
SWF / Flash CWS container scanning intentionally out of scope: JBIG2 codec should not drag compress/zlib into import graph for payload shape no fixture exercises. Callers needing SWF-wrapped JBIG2 should strip SWF container in own layer, feed inner stream to NewDecoder / NewDecoderEmbedded directly.
func NewDecoderEmbedded ¶
NewDecoderEmbedded creates a decoder for JBIG2 stream with no file header - "embedded" stream starting directly at first segment header. PDF /JBIG2Decode delivers this shape (PDF §7.4.7: "the file header, end-of-page segment, and end-of-file segment shall not be present"). Pass empty globals for self-contained streams, or decoded /JBIG2Globals bytes when referencing external symbol-dict context.
Auto-detect path in NewDecoder / NewDecoderWithGlobals needs 8-byte JBIG2 magic; PDF strips it, probing fails. NewDecoderEmbedded skips probing, sets embedded-mode params directly: sequential organization, no random access, big-endian byte order with small little-endian heuristic on first 4 bytes (matches NewDecoderWithGlobals fallback when probing fails AND globals non-empty).
Cheap plausibility sniff on first segment header before document loop - random ASCII or non-JBIG2 input otherwise drives segment parser into long stalls (spec requires no specific byte at offset 0, so parser has nothing to short-circuit on). Up-front reject = clean error path instead of hang.
Example (Pdf) ¶
ExampleNewDecoderEmbedded_pdf shows the canonical PDF-reader flow: pull JBIG2Decode-filtered image stream plus optional /JBIG2Globals from PDF, hand both to NewDecoderEmbedded, Decode page bitmap.
Fixture testdata/pdf-embedded/sample.jb2 extracted from real PDF. 94 bytes decode to 3562x851 fully-black bitmap.
package main
import (
"bytes"
"fmt"
"image"
"os"
gobig2 "github.com/dkrisman/gobig2"
)
func main() {
// Real PDF reader: imageStream = bytes between `stream\n`
// and `\nendstream` of Image XObject with /Filter
// /JBIG2Decode. globalsBytes = /JBIG2Globals from
// /DecodeParms; nil when image dict has no reference.
imageStream, err := os.ReadFile("testdata/pdf-embedded/sample.jb2")
if err != nil {
fmt.Println(err)
return
}
var globalsBytes []byte // pulled from /JBIG2Globals if present
dec, err := gobig2.NewDecoderEmbedded(bytes.NewReader(imageStream), globalsBytes)
if err != nil {
// Adversarial / non-JBIG2 rejected up front, before any
// allocation from declared dimensions. Surface to PDF
// reader's per-image error path; do not panic.
fmt.Println("decode error:", err)
return
}
img, err := dec.Decode()
if err != nil {
fmt.Println("decode error:", err)
return
}
// img is *image.Gray; ink = 0 (black), paper = 255 (white).
// PDF /Width and /Height should match JBIG2 page-info
// dimensions; if not, trust JBIG2 stream and scale via CTM.
g := img.(*image.Gray)
fmt.Printf("decoded %dx%d\n", g.Bounds().Dx(), g.Bounds().Dy())
}
Output: decoded 3562x851
func NewDecoderEmbeddedWithGlobals ¶
func NewDecoderEmbeddedWithGlobals(r io.Reader, globals *ParsedGlobals) (*Decoder, error)
NewDecoderEmbeddedWithGlobals creates a Decoder for a PDF-embedded JBIG2 stream sharing pre-parsed globals. Equivalent to NewDecoderEmbedded but skips per-decode globals re-parse - useful when same /JBIG2Globals referenced from many image XObjects.
Pass nil globals (or no-op ParseGlobals(nil)) for self-contained streams. Either form produces a Decoder supporting Decoder.Reset - constructor stamps no-op handle so resettable property holds regardless of whether caller had globals to bind.
func NewDecoderWithGlobals ¶
NewDecoderWithGlobals creates a decoder using an external globals stream.
For PDF-shaped flow where input may carry JBIG2 file header (probe.Configs locks on) or be raw embedded segment stream (fall back to embedded-mode params, rely on globals for referenced symbol dicts). SWF / Flash container scanning out of scope; see NewDecoder rationale.
func (*Decoder) Decode ¶
Decode decodes the next page.
Equivalent to Decoder.DecodeContext with context.Background. Use DecodeContext when canceling a long-running decode (e.g. via context.WithTimeout).
Internal loop tracks progress via [Document.Progress] (stream-cursor advances OR grouped-mode index advances) so adversarial input driving [Document.DecodeSequential] into non-terminal state without forward motion can't hang decoder. Second consecutive call not advancing progress token aborts as malformed - same defensive bound global-segments loop applies in [drainGlobals].
func (*Decoder) DecodeAll ¶
DecodeAll decodes all remaining pages.
Equivalent to Decoder.DecodeAllContext with context.Background.
func (*Decoder) DecodeAllContext ¶
DecodeAllContext decodes all remaining pages, honoring ctx for cancellation. nil ctx treated as context.Background. Partial results kept on cancel; returned error is first failure encountered, wrapping `ctx.Err()` on cancel. Check via `errors.Is(err, context.Canceled)` or `errors.Is(err, context.DeadlineExceeded)`.
Context consulted between every segment of every page; see Decoder.DecodeContext for per-page cancellation contract.
Memory note: each decoded page appended to returned slice, held in caller memory until slice out of scope. Each entry 8-bpp `*image.Gray` (one byte/pixel), so 100-page document of 8-megapixel pages keeps ~800 MiB alive simultaneously. Internal packed page bitmap released after each page (see Decoder.DecodeContext), so gray slice is only per-page retention but unbounded by Limits. Prefer Decoder.DecodeContext in loop, processing/writing each page before requesting next, when document size is large or attacker-controlled.
func (*Decoder) DecodeContext ¶
DecodeContext decodes the next page, honoring ctx for cancellation. nil ctx treated as context.Background. On cancel, error wraps `ctx.Err()`; check via `errors.Is(err, context.Canceled)` or `errors.Is(err, context.DeadlineExceeded)`.
Cancellation checked between segments inside segment-parser loop, so latency bounded by cost of one segment (itself capped by per-region Limits).
Peak-memory note: on success, packed 1-bpp page bitmap converted to dense `*image.Gray` (one byte/pixel) before packed released. Short window both representations live; gray copy ~8x packed footprint. 256-megapixel page (default `MaxImagePixels`) = ~32 MiB packed + ~256 MB gray = peak ~288 MiB during conversion, driven almost entirely by gray output. Plan `runtime/debug.SetMemoryLimit` or per-call wall-clock budgets around peak, not steady-state packed.
func (*Decoder) DecodePacked ¶
func (d *Decoder) DecodePacked() (PackedPage, error)
DecodePacked is Decoder.Decode for bilevel-aware consumers. Returns the packed internal bitmap directly via PackedPage, skipping the *image.Gray conversion that Decoder.Decode performs. On a 600 dpi A4 page that saves ~12 ms wall + ~35 MB allocation; downstream PBM / 1-bpp-PNG writers want the packed form anyway.
Equivalent to Decoder.DecodePackedContext with context.Background.
func (*Decoder) DecodePackedContext ¶
func (d *Decoder) DecodePackedContext(ctx context.Context) (PackedPage, error)
DecodePackedContext is Decoder.DecodeContext for bilevel- aware consumers - same cancellation contract, returns a PackedPage instead of an *image.Gray. See PackedPage for byte layout and aliasing lifetime.
func (*Decoder) GetDocument ¶
GetDocument returns the underlying document.
Stability: advanced / unstable. Returned *Document is internal parse orchestrator Decoder owns. Exposed so bundled `--inspect` tool and low-level callers can walk segment metadata, but method surface is internal/segment and not part of gobig2's public API contract. In particular:
- Calling DecodeSequential, SetContext, ReleasePageSegments, or any state-mutating method directly is unsupported while parent Decoder still in use. Mixing those with Decode / DecodeContext yields undefined behavior.
- Exposed type, fields, and methods may change between versions without deprecation cycle.
Treat result as read-only inspection handle. Use for segment-table dumps and similar tooling; route every decode through Decoder.
func (*Decoder) Reset ¶
Reset reinitializes Decoder for a new PDF-embedded JBIG2 stream while keeping previously bound ParsedGlobals attached. Hot path for PDF reader iterating over image XObjects sharing a /JBIG2Globals: parse once, build one Decoder with NewDecoderEmbeddedWithGlobals, then Reset between images instead of re-allocating fresh Decoder + re-parsing globals.
Reset returns ErrUnsupported when called on Decoder not built via NewDecoderEmbeddedWithGlobals; re-invoke those constructors instead.
Errors wrap ErrMalformed when r yields non-JBIG2 input.
type Document ¶
Document is the document-parsing context returned by Decoder.GetDocument.
type Limits ¶
type Limits struct {
// MaxImagePixels caps the total pixel count of any single
// bitmap NewImage allocates. Default 256 megapixels.
MaxImagePixels int64
// MaxSymbolsPerDict caps SDNUMNEWSYMS / SDNUMEXSYMS at parse
// and also the aggregate input-symbol pool a text region or
// symbol dict assembles across all referenced symbol-dict
// segments. Default 1 M.
MaxSymbolsPerDict uint32
// MaxPatternsPerDict caps the halftone HDPATS array length.
// Default 1 M.
MaxPatternsPerDict uint32
// MaxHalftoneGridCells caps halftone HGW x HGH grid product.
// Each cell expands to HPW x HPH pixels, so cell count is
// order of magnitude below rendered pixel count: 1200-DPI A4
// (~140 megapixels) with 8x8 patterns = ~2 megacells; even
// 2x2 stays under ~35 megacells. Default 64 megacells past
// legit use, bounds worst-case per-cell rendering at ~2 s
// CPU. Without cap, adversarial HGW/HGH each under per-side
// state.MaxImageSize can still declare multi-gigacell grid
// against tiny output region.
MaxHalftoneGridCells uint64
// MaxIaidCodeLen caps SBSYMCODELEN before IAID context array
// allocated. Max practical 30. Cap is `const` in
// internal/arith; field here for API symmetry but
// [Limits.Apply] ignores it.
MaxIaidCodeLen uint8
// MaxRefaggninst caps REFAGGNINST per aggregate symbol. Real
// glyphs rarely exceed few dozen per aggregate; default 1024
// well above legit use.
MaxRefaggninst uint32
// MaxSymbolPixels caps SYMWIDTH x HCHEIGHT per symbol bitmap.
// Real glyphs are tens of pixels/side; default 4 megapixels
// = two orders beyond legit, well below page-level cap.
// Adversarial input drives single glyph to multi-megapixel
// then iterates generic-region template loop per pixel (~10
// s CPU per 16 megapixel adversarial symbol on dev VM).
MaxSymbolPixels uint64
// MaxPixelsPerByte caps ratio of declared page-info
// `width x height` to total input-byte budget. Default 1 M
// pixels/byte (~30x headroom over highest-ratio fixture)
// rejects 152-megapixel from 30-byte adversarial pages
// while leaving room for tight encodings (bundled
// testdata/pdf-embedded/sample.jb2 ratio ~32 K).
MaxPixelsPerByte uint64
// MaxSymbolDictPixels caps aggregate SYMWIDTH x HCHEIGHT sum
// across all symbols in one SDD call. Complements
// MaxSymbolPixels: adversarial dict can declare hundreds of
// small symbols each passing per-symbol cap but accumulating
// to hundreds of megapixels of template-loop work. Real
// text-heavy fixtures top out at few megapixels per dict;
// default 16 megapixels.
MaxSymbolDictPixels uint64
// MaxBytesPerSegment caps DataLength declared in each segment
// header. Real segments rarely exceed few MB; default 16 MB
// rejects 4 GB adversarial declarations at parse before any
// per-segment work. 0xFFFFFFFF "unknown length" streaming
// sentinel is exempt.
MaxBytesPerSegment uint64
}
Limits bundles resource caps the codec consults when allocating bitmaps and dictionaries. Each bounds a different attacker-controlled allocation or work site:
- MaxImagePixels caps any single bitmap NewImage allocates. 1200-DPI A4 = ~140 megapixels; default 256 megapixels leaves ~10x headroom, blocks pathological dimensions.
- MaxSymbolsPerDict caps SDNUMNEWSYMS / SDNUMEXSYMS at parse and aggregate input-symbol pool a text region or symbol dict assembles across referenced symbol-dict segments. Real dicts rarely exceed few thousand (corpus max: 308); 1 M default well above legit, bounds pre-decode pointer- slice allocation.
- MaxPatternsPerDict caps the halftone HDPATS array.
- MaxHalftoneGridCells caps halftone HGW x HGH grid product. Per-side state.MaxImageSize rejects oversized HGW/HGH; this cap covers when both fit per-side but product drives per-cell rendering into multi-second decode regardless of output region size.
- MaxIaidCodeLen caps SBSYMCODELEN before IAID context array allocated. Array sizes 1<<SBSYMCODELEN; cap 30 holds worst case below 16 GiB.
- MaxRefaggninst caps REFAGGNINST per aggregate symbol; blocks adversarial inner-refinement decode hangs.
- MaxSymbolPixels caps SYMWIDTH x HCHEIGHT per symbol; blocks multi-megapixel glyph driving generic-region template loop into multi-second decode.
- MaxSymbolDictPixels caps aggregate SYMWIDTH x HCHEIGHT sum across all symbols in one SDD call; blocks "many small symbols" passing per-symbol cap but accumulating to hundreds of megapixels.
- MaxPixelsPerByte caps ratio of declared page-info width x height to total input-byte budget; blocks 30-byte -> 152-megapixel page-info shapes at parse.
- MaxBytesPerSegment caps per-segment DataLength; blocks adversarial 4 GB segment-length at parse before any per-segment work.
Zero on any field = "no cap". Callers wanting no limits can use Limits{} but should pair with own wall-clock budget.
Concurrency. Limits.Apply mutates process-wide package vars; not safe across goroutines or concurrent with active decodes. Configure once at startup, then spawn goroutines. Concurrent Decoder instances calling Decode / DecodeContext on independent inputs are safe - each owns its own Document; package Limits read-only after Apply returns.
Tests. Tests mutating caps (via Limits.Apply or direct var writes) must not call `t.Parallel` and must save/restore via deferred reset. Prefer DefaultLimits().Apply snapshots when swapping complete profiles; direct-var fine for one-knob tests touching a single cap.
func DefaultLimits ¶
func DefaultLimits() Limits
DefaultLimits returns the codec's stock caps - values package vars carry before any Limits.Apply call. Starting point for customized Limits.
Stability: every field sourced from compile-time constant in owning internal package, so DefaultLimits() returns same values regardless of prior Limits.Apply call. Per-package var Apply mutates is initialized from same constant.
func (Limits) Apply ¶
func (l Limits) Apply()
Apply writes l into package-level caps internal decoders consult. Process-wide; not safe concurrent with itself or active Decode (package vars read by every decoder, mid-decode mutation could race). Concurrent Decode on independent Decoder instances safe - share read-only Limits, each owns own Document state. Field = 0 disables that cap entirely.
FOOTGUN: callers tweaking one or two caps must start from DefaultLimits not a bare struct literal:
// WRONG - silently disables every cap you didn't list:
gobig2.Limits{MaxImagePixels: 100_000_000}.Apply()
// RIGHT - preserves the documented defaults you didn't change:
limits := gobig2.DefaultLimits()
limits.MaxImagePixels = 100_000_000
limits.Apply()
Bare-literal form only when intentionally disabling everything except listed fields (e.g. fuzz harnesses needing permissive profile).
MaxIaidCodeLen is `const` in internal/arith and cannot be reassigned at runtime - field on Limits for API symmetry but Apply ignores. To loosen IAID cap, rebuild codec with constant changed; bound is hard ceiling on pre-allocation context array, not configurable knob.
type PackedPage ¶
type PackedPage struct {
// Data is the packed bitmap. Aliases the Decoder's internal
// page buffer; see PackedPage doc for lifetime.
Data []byte
// Width is the page width in pixels.
Width int
// Height is the page height in pixels.
Height int
// Stride is the byte offset between consecutive rows
// (== ceil(Width/8)).
Stride int
}
PackedPage is a 1-bit-per-pixel page bitmap in MSB-first packed bytes. Returned by Decoder.DecodePacked for callers that consume the bilevel data directly - PBM writers, 1-bpp PNG encoders, bit-blit pipelines - without paying for the 8-bpp *image.Gray conversion Decoder.Decode performs (~12 ms + ~35 MB alloc on a 600 dpi A4 page).
Pixel layout: row r starts at Data[r*Stride]. Within a byte, bit 7 (MSB) is the leftmost pixel; ink = 1, paper = 0. Same polarity and packing as PBM (P4) and as the bytes a /JBIG2Decode filter delivers.
Data aliases the Decoder's internal page buffer. It stays valid until the next Decode / DecodeContext / DecodePacked / DecodePackedContext / Reset call on the same Decoder; copy it before that boundary if you need it to outlive the Decoder's next operation.
type ParsedGlobals ¶
type ParsedGlobals struct {
// contains filtered or unexported fields
}
ParsedGlobals holds a pre-parsed JBIG2 globals stream shareable across many NewDecoderEmbeddedWithGlobals calls. Use when single PDF /JBIG2Globals referenced from multiple image XObjects: parse once with ParseGlobals, bind into each image's Decoder.
Read-only after construction. Not safe for concurrent decode - bind from single goroutine processing images sequentially. Concurrent PDF readers should ParseGlobals once per worker, not once per process.
func ParseGlobals ¶
func ParseGlobals(globals []byte) (*ParsedGlobals, error)
ParseGlobals parses a JBIG2 globals stream and returns a reusable handle. Empty / nil slice creates a no-globals handle (equivalent to passing nil to a non-Parsed constructor).
Bytes are typically the decoded /JBIG2Globals stream object referenced from a PDF image XObject's /DecodeParms.
The returned handle retains a reference to the input `globals` slice for the lifetime of the handle. Do not mutate the slice after this call; callers that need to free the source bytes should pass a copy.
Errors wrap ErrResourceBudget when the slice exceeds MaxInputBytes; otherwise parse failures wrap ErrMalformed. Match either sentinel via errors.Is to keep the budget-vs-malformed distinction downstream callers rely on.
type Result ¶
Result is the document parser's per-step result code, returned by [Document.DecodeSequential]. The public Decoder loop in Decoder.Decode / Decoder.DecodeContext switches on it.
Directories
¶
| Path | Synopsis |
|---|---|
|
cmd
|
|
|
extract-jbig2
command
Command extract-jbig2 walks a PDF and dumps every /Filter /JBIG2Decode image XObject's stream bytes (plus any referenced /JBIG2Globals) as separate .jb2 files.
|
Command extract-jbig2 walks a PDF and dumps every /Filter /JBIG2Decode image XObject's stream bytes (plus any referenced /JBIG2Globals) as separate .jb2 files. |
|
gobig2
command
Command gobig2 decodes a standalone or PDF-embedded JBIG2 stream into a bitmap.
|
Command gobig2 decodes a standalone or PDF-embedded JBIG2 stream into a bitmap. |
|
perf-cross
command
Command perf-cross times each installed JBIG2 decoder (gobig2, jbig2dec, mutool, pdfimages, PDFBox) over a fixture set and emits a Markdown comparison table plus the raw JSON measurement matrix.
|
Command perf-cross times each installed JBIG2 decoder (gobig2, jbig2dec, mutool, pdfimages, PDFBox) over a fixture set and emits a Markdown comparison table plus the raw JSON measurement matrix. |
|
internal
|
|
|
arith
Package arith implements JBIG2 MQ arithmetic coder and the integer / IAID adapters above it.
|
Package arith implements JBIG2 MQ arithmetic coder and the integer / IAID adapters above it. |
|
bio
Package bio implements bit-level I/O over a byte buffer.
|
Package bio implements bit-level I/O over a byte buffer. |
|
errs
Package errs holds sentinel error values re-exported by gobig2 for `errors.Is` classification.
|
Package errs holds sentinel error values re-exported by gobig2 for `errors.Is` classification. |
|
generic
Package generic implements JBIG2 generic-region decoding (T.88 §6.2).
|
Package generic implements JBIG2 generic-region decoding (T.88 §6.2). |
|
halftone
Package halftone implements halftone region decoding (T.88 §6.6, type-22/23 segments) and the pattern dictionary it indexes (T.88 §6.7, type-16).
|
Package halftone implements halftone region decoding (T.88 §6.6, type-22/23 segments) and the pattern dictionary it indexes (T.88 §6.7, type-16). |
|
huffman
Package huffman implements JBIG2 Huffman tables (T.88 Annex B, B.1-B.15) plus a generic decoder.
|
Package huffman implements JBIG2 Huffman tables (T.88 Annex B, B.1-B.15) plus a generic decoder. |
|
intmath
Package intmath holds small integer helpers shared across the decoder.
|
Package intmath holds small integer helpers shared across the decoder. |
|
mmr
Package mmr decodes JBIG2's CCITT Group 4 / T.6 (MMR) bitmap streams.
|
Package mmr decodes JBIG2's CCITT Group 4 / T.6 (MMR) bitmap streams. |
|
page
Package page holds the bi-level Image type that region decoders write into and the document orchestrator stitches to form the final page bitmap.
|
Package page holds the bi-level Image type that region decoders write into and the document orchestrator stitches to form the final page bitmap. |
|
refinement
Package refinement implements JBIG2 generic refinement region decoding (T.88 §6.3).
|
Package refinement implements JBIG2 generic refinement region decoding (T.88 §6.3). |
|
segment
Package segment owns the JBIG2 segment table and the document orchestrator (Document) that walks it.
|
Package segment owns the JBIG2 segment table and the document orchestrator (Document) that walks it. |
|
state
Package state holds cross-cutting enums and constants shared by JBIG2 decoders and the document orchestrator.
|
Package state holds cross-cutting enums and constants shared by JBIG2 decoders and the document orchestrator. |