Documentation
¶
Overview ¶
Package layout owns the lower-level geometry primitives that drive table-finding: edges, edge-derivation from Lines/Rects/Curves, and edge merging (snap + join).
The split between this internal package and the public pdftable package mirrors the pdfplumber split between pdfplumber/utils/geometry.py (edge maths) and pdfplumber/table.py (the TableFinder). Keeping the edge maths here lets us evolve the representation freely while the public surface in pdftable's table.go / finder.go stays stable.
Coordinate system: PDF user space — origin at bottom-left, Y growing UP. An edge is a single-axis line segment:
- Horizontal edge: Y0 == Y1, X0 <= X1. Orientation "h".
- Vertical edge: X0 == X1, Y0 <= Y1. Orientation "v".
(Pdfplumber operates in IMAGE space, Y growing down; its "top" is our larger Y, its "bottom" is our smaller Y. The algorithms here are the same, only the coordinate sign is flipped — see comments at each step for the explicit mapping.)
Index ¶
- type CurveSegment
- type Edge
- func FilterEdgesByLength(edges []Edge, minLength float64) []Edge
- func FilterEdgesByOrientation(edges []Edge, o Orientation) []Edge
- func FilterEdgesBySource(edges []Edge, allow ...Source) []Edge
- func FromCurve(c CurveSegment, tolerance float64) []Edge
- func FromLine(l LineSegment, tolerance float64) (Edge, bool)
- func FromRect(r RectSegment) []Edge
- func JoinEdges(edges []Edge, joinXTolerance, joinYTolerance float64) []Edge
- func MergeEdges(edges []Edge, snapXTol, snapYTol, joinXTol, joinYTol float64) []Edge
- func SnapEdges(edges []Edge, snapXTolerance, snapYTolerance float64) []Edge
- func SortEdges(edges []Edge) []Edge
- type LineSegment
- type Orientation
- type RectSegment
- type Source
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CurveSegment ¶
CurveSegment is the point list of a curve path. We turn each horizontal or vertical pair of consecutive points into an edge — curves that are entirely diagonal contribute zero edges.
type Edge ¶
type Edge struct {
X0, Y0, X1, Y1 float64
Orientation Orientation
Source Source
// Width is the stroke width of the originating primitive (used by
// some callers to filter hair-thin construction lines). Zero is
// "unknown".
Width float64
}
Edge is one axis-aligned line segment carrying the data the table- finder needs for snap + join + intersection.
Invariants (constructor-enforced via newEdge / normalise):
- Orientation == "h" → Y0 == Y1, X0 <= X1.
- Orientation == "v" → X0 == X1, Y0 <= Y1.
Length() returns the along-direction extent; Pos() returns the perpendicular position (the "snap axis" for merging).
func FilterEdgesByLength ¶
FilterEdgesByLength drops edges whose along-direction length is below minLength. Pdfplumber calls this twice in the TableFinder pipeline: once with edge_min_length_prefilter on the raw edges (default 1 — drop hairline construction lines) and once with edge_min_length on the merged set (default 3 — drop short stubs after snap+join).
func FilterEdgesByOrientation ¶
func FilterEdgesByOrientation(edges []Edge, o Orientation) []Edge
FilterEdgesByOrientation keeps only edges with the given orientation. Convenience over a one-line loop, used by the TableFinder when separating the v / h edge lists post-merge for intersection detection.
func FilterEdgesBySource ¶
FilterEdgesBySource keeps only edges produced by an allowed Source. Used by lines_strict mode to drop rect and curve edges.
func FromCurve ¶
func FromCurve(c CurveSegment, tolerance float64) []Edge
FromCurve returns one edge per consecutive pair of points that lies on the same axis. Diagonal pairs are dropped. pdfplumber does the same in curve_to_edges.
func FromLine ¶
func FromLine(l LineSegment, tolerance float64) (Edge, bool)
FromLine returns one Edge for a horizontal or vertical line segment, or (zero, false) for diagonal lines (they aren't axis- aligned and so can't be table rules).
Tolerance is the same near-axis-aligned slack pdfplumber's line_to_edge predicate uses implicitly: a "horizontal" line is one whose Y0 and Y1 are equal. We treat them as equal when their difference is below the supplied tolerance to absorb floating- point drift from the content-stream interpreter.
func FromRect ¶
func FromRect(r RectSegment) []Edge
FromRect returns the four edges of a rectangle's outline. We mirror pdfplumber's rect_to_edges: top, bottom, left, right — all tagged SourceRect. Filled-only (non-stroked) rectangles still produce edges because pdfplumber's edges property aggregates BOTH stroked and filled rects (a filled cell-background still defines a row boundary).
func JoinEdges ¶
JoinEdges merges collinear edges whose along-direction extents touch (within joinTolerance). Two horizontal edges with the same Y and overlapping or near-touching X ranges become one edge spanning their union; similarly for vertical edges.
Edges that don't overlap and aren't within the join tolerance pass through unchanged.
This is the Go port of pdfplumber's table.join_edge_group, called once per (orientation, perpendicular-position) group.
func MergeEdges ¶
MergeEdges is the snap-then-join pipeline. It's the entry point that table.go calls; it mirrors pdfplumber's table.merge_edges.
Order:
- Snap (collapse near-collinear edges onto their mean position).
- Group by (orientation, position) and join within each group.
func SnapEdges ¶
SnapEdges replaces near-collinear edges with edges sharing the average perpendicular position. Horizontal edges within snapYTolerance of each other on the Y axis get unified onto their mean Y; vertical edges within snapXTolerance of each other on the X axis get unified onto their mean X.
This is the Go port of pdfplumber's table.snap_edges, which dispatches into utils.snap_objects per orientation.
A tolerance of 0 leaves the edges unchanged for that orientation.
func SortEdges ¶
SortEdges returns a stable-sorted copy of edges keyed by (orientation, perpendicular position, along-direction start). This is the deterministic order downstream stages rely on for intersection enumeration.
func (Edge) Length ¶
Length returns the edge's extent along its orientation axis.
- Horizontal edge → X1 - X0.
- Vertical edge → Y1 - Y0.
type LineSegment ¶
LineSegment is a minimal struct describing a drawn straight-line segment. It exists so this package doesn't have to import the public pdftable types — keeps the dependency direction one-way (pdftable depends on layout, not the other way round).
type Orientation ¶
type Orientation string
Orientation is the axis an edge lies along. "h" = horizontal, "v" = vertical. Diagonal lines never become edges; they're dropped at derivation time.
const ( Horizontal Orientation = "h" Vertical Orientation = "v" )
type RectSegment ¶
RectSegment is a minimal Rect descriptor for the same reason. Only the bbox matters for edge derivation; we drop the Stroke/Fill flags at the layer above by filtering out non-stroked rects before calling FromRect.
type Source ¶
type Source uint8
Source tags say which kind of drawn primitive produced an edge. pdfplumber distinguishes between "line", "rect_edge", and "curve_edge" so that lines_strict mode can ignore everything that isn't a literal stroked line. We carry the same distinction.
const ( // SourceLine: an edge derived from a stroked Line. SourceLine Source = iota // SourceRect: an edge derived from one side of a Rect. SourceRect // SourceCurve: an edge derived from a Curve's straight segment. // We accept curve edges in "lines" mode but not "lines_strict". SourceCurve // SourceExplicit: an edge constructed from an // ExplicitVerticalLines / ExplicitHorizontalLines setting. SourceExplicit // SourceText: an edge inferred from word alignment by the "text" // strategy. words_to_edges_v / words_to_edges_h in pdfplumber. SourceText )