rules

package
v0.1.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 13, 2026 License: MIT Imports: 5 Imported by: 0

Documentation

Overview

Package rules provides built-in advisor rules for pgexplain.

Each rule implements the advisor.Rule interface and is constructed via a factory function (e.g. SeqScan, SortSpill). Rules accept optional functional options to override default thresholds.

Built-in rules

All rules are safe for concurrent use.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func HashJoinSpill

func HashJoinSpill() advisor.Rule

HashJoinSpill returns a Rule that warns when a Hash Join spills its hash table to disk because the inner relation exceeded work_mem.

What is a hash join spill?

A Hash Join builds a hash table from the inner (smaller) relation in memory, then probes it with each row from the outer relation. The memory budget for this hash table comes from work_mem (default: 4MB).

When the inner relation is too large to fit in work_mem, PostgreSQL falls back to a batched strategy: it splits both relations into N batches, writes them to temporary disk files, and processes one batch at a time. This is indicated by Hash Batches > 1 on the Hash node.

Disk-based batching is orders of magnitude slower than an in-memory hash join because each row may be written to and read from disk multiple times.

What does this rule check?

The rule fires on the Hash node (the inner child of a Hash Join) when:

  • Hash Batches > 1 (any spill, regardless of batch count)

There is no configurable threshold — a spill is always worth flagging. The finding includes Peak Memory Usage to give a concrete work_mem suggestion.

Future work

PostgreSQL also reports Original Hash Batches (the initially planned batch count) separately from the final Hash Batches. A plan that started with 1 batch but grew to 4 during execution indicates a particularly bad estimate. This distinction is not yet captured in the parser's Node struct.

Usage

rules.HashJoinSpill()
Example

ExampleHashJoinSpill demonstrates the HashJoinSpill rule on a plan where the hash table for a join exceeded work_mem and spilled to disk across 4 batches.

The Hash node (inner child of Hash Join) is where PostgreSQL builds the hash table. Hash Batches > 1 means it could not fit in work_mem and batched to disk.

package main

import (
	"fmt"

	"github.com/bright98/pgexplain/advisor"
	"github.com/bright98/pgexplain/parser"
	"github.com/bright98/pgexplain/rules"
)

func main() {
	explainJSON := []byte(`[{
		"Plan": {
			"Node Type": "Hash Join",
			"Parallel Aware": false,
			"Startup Cost": 31.50,
			"Total Cost": 3025.10,
			"Plan Rows": 1509,
			"Plan Width": 104,
			"Actual Startup Time": 2.451,
			"Actual Total Time": 38.234,
			"Actual Rows": 1842,
			"Actual Loops": 1,
			"Hash Cond": "(o.customer_id = c.id)",
			"Plans": [
				{
					"Node Type": "Seq Scan",
					"Parent Relationship": "Outer",
					"Parallel Aware": false,
					"Relation Name": "orders",
					"Alias": "o",
					"Startup Cost": 0.00,
					"Total Cost": 2846.00,
					"Plan Rows": 100000,
					"Plan Width": 72,
					"Actual Startup Time": 0.012,
					"Actual Total Time": 22.100,
					"Actual Rows": 100000,
					"Actual Loops": 1
				},
				{
					"Node Type": "Hash",
					"Parent Relationship": "Inner",
					"Parallel Aware": false,
					"Startup Cost": 25.00,
					"Total Cost": 25.00,
					"Plan Rows": 520,
					"Plan Width": 36,
					"Actual Startup Time": 2.431,
					"Actual Total Time": 2.431,
					"Actual Rows": 523,
					"Actual Loops": 1,
					"Hash Buckets": 1024,
					"Hash Batches": 4,
					"Peak Memory Usage": 4096
				}
			]
		},
		"Planning Time": 0.892,
		"Execution Time": 46.203
	}]`)

	plan, err := parser.Parse(explainJSON)
	if err != nil {
		panic(err)
	}

	adv := advisor.New(
		rules.HashJoinSpill(),
	)

	for _, f := range adv.Analyze(plan) {
		node, _ := plan.NodeByID(f.NodeID)
		fmt.Printf("[%s] %s\n", f.Severity, f.Message)
		fmt.Printf("  node ID:  %d (%s)\n", node.ID, node.NodeType)
		fmt.Printf("  batches:  %d\n", *node.HashBatches)
		fmt.Printf("  peak mem: %dkB\n", *node.PeakMemoryUsage)
	}

}
Output:
[WARN] hash join spilled to disk across 4 batches (peak memory: 4096kB per batch)
  node ID:  3 (Hash)
  batches:  4
  peak mem: 4096kB

func MissingIndexOnlyScan

func MissingIndexOnlyScan(opts ...IndexOnlyScanOption) advisor.Rule

MissingIndexOnlyScan returns a Rule that warns when an Index Only Scan is degraded — meaning PostgreSQL is falling back to the heap for a significant fraction of rows instead of serving them from the index alone.

What is an Index Only Scan?

An Index Only Scan is PostgreSQL's most efficient read path: it answers a query entirely from the index, never touching the main table (heap). This works only when:

  1. The index covers all columns in the SELECT list and WHERE clause.
  2. The visibility map marks the page as all-visible, so PostgreSQL can skip the heap check for MVCC visibility.

When condition 2 fails — because VACUUM hasn't run recently enough to mark pages as all-visible — PostgreSQL must visit the heap for each row to confirm it is visible to the current transaction. These extra heap visits are counted in Heap Fetches.

What does this rule check?

ratio = HeapFetches / (ActualRows × ActualLoops)
if ratio >= minHeapFetchRatio (default 0.1)
  → emit Warn finding

A ratio of 0.0 means the Index Only Scan was fully index-covered — ideal. A ratio of 1.0 means every single row required a heap visit — equivalent to a plain Index Scan with extra overhead.

When does this rule fire?

All conditions must hold:

  1. Node type is Index Only Scan.
  2. HeapFetches is non-nil (present in EXPLAIN output).
  3. ANALYZE was run (ActualRows and ActualLoops are present).
  4. ratio >= minHeapFetchRatio.

Usage

rules.MissingIndexOnlyScan()
rules.MissingIndexOnlyScan(rules.WithMinHeapFetchRatio(0.5))
Example

ExampleMissingIndexOnlyScan demonstrates the MissingIndexOnlyScan rule on a plan where an Index Only Scan fetches 40% of rows from the heap.

An Index Only Scan should serve all rows from the index without visiting the main table (heap). When VACUUM has not updated the visibility map, PostgreSQL must visit the heap to verify row visibility — defeating the purpose of the scan. Heap Fetches counts these visits.

package main

import (
	"fmt"

	"github.com/bright98/pgexplain/advisor"
	"github.com/bright98/pgexplain/parser"
	"github.com/bright98/pgexplain/rules"
)

func main() {
	explainJSON := []byte(`[{
		"Plan": {
			"Node Type": "Index Only Scan",
			"Parallel Aware": false,
			"Relation Name": "users",
			"Alias": "u",
			"Index Name": "users_email_idx",
			"Startup Cost": 0.42,
			"Total Cost": 312.44,
			"Plan Rows": 500,
			"Plan Width": 36,
			"Actual Startup Time": 0.031,
			"Actual Total Time": 4.812,
			"Actual Rows": 500,
			"Actual Loops": 1,
			"Index Cond": "(email = 'x@example.com')",
			"Heap Fetches": 200
		},
		"Planning Time": 0.089,
		"Execution Time": 4.921
	}]`)

	plan, err := parser.Parse(explainJSON)
	if err != nil {
		panic(err)
	}

	adv := advisor.New(
		rules.MissingIndexOnlyScan(), // default: warn when >= 10% of rows hit the heap
	)

	for _, f := range adv.Analyze(plan) {
		fmt.Printf("[%s] %s\n", f.Severity, f.Message)
		fmt.Printf("  detail:     %s\n", f.Detail)
	}

}
Output:
[WARN] Index Only Scan on "users" (index: users_email_idx) fetched 40% of rows from the heap
  detail:     An Index Only Scan should serve rows entirely from the index without touching the heap. This node returned 500 rows but fetched 200 from the heap (40%). Heap fetches happen when the visibility map does not mark the page as all-visible, forcing PostgreSQL to verify row visibility in "users". This typically means VACUUM has not run recently enough on the table.

func NestedLoopLarge

func NestedLoopLarge(opts ...NestedLoopOption) advisor.Rule

NestedLoopLarge returns a Rule that warns when a Nested Loop join executes its inner side an excessive number of times.

What is a Nested Loop join?

A Nested Loop iterates over every outer row and, for each one, executes the inner side to find matches. The total work is:

outer_rows × cost_per_inner_probe

When the inner side has an index on the join key, each probe is O(log n) — many probes can still be acceptable. When the inner side is a Seq Scan, each probe reads the entire inner table — the complexity is O(outer × inner), truly quadratic.

The number of inner executions is visible as Actual Loops on the inner child node in EXPLAIN output.

Severity

The severity depends on the inner node type:

  • advisor.Error — inner side is a Seq Scan: every outer row triggers a full table scan. Almost always a missing index on the join key.
  • advisor.Warn — inner side uses an index or another strategy: many probes but each probe is efficient. May indicate a bad row estimate caused the planner to prefer Nested Loop over Hash Join.

When does this rule fire?

All conditions must hold:

  1. Node type is Nested Loop.
  2. ANALYZE was run (Actual Loops is present on the inner child).
  3. An inner child exists (Parent Relationship == "Inner").
  4. inner.ActualLoops >= minInnerLoops (default 1000).

Usage

rules.NestedLoopLarge()
rules.NestedLoopLarge(rules.WithMinInnerLoops(5000))
Example (IndexScan)

ExampleNestedLoopLarge_indexScan shows a Nested Loop where the inner side is an Index Scan executed 1000 times — once per outer row. Severity is Warn because each probe is efficient, but 1000 probes may indicate a bad estimate.

package main

import (
	"fmt"
	"os"

	"github.com/bright98/pgexplain/advisor"
	"github.com/bright98/pgexplain/parser"
	"github.com/bright98/pgexplain/rules"
)

func main() {
	// nested_loop.json: orders (outer, 1000 rows) × order_items index scan (inner, 1000 loops)
	explainJSON, err := os.ReadFile("../testdata/nested_loop.json")
	if err != nil {
		panic(err)
	}

	plan, err := parser.Parse(explainJSON)
	if err != nil {
		panic(err)
	}

	adv := advisor.New(rules.NestedLoopLarge())

	for _, f := range adv.Analyze(plan) {
		fmt.Printf("[%s] %s\n", f.Severity, f.Message)
	}

}
Output:
[WARN] nested loop probed Index Scan on "order_items" 1000 times
Example (SeqScan)

ExampleNestedLoopLarge_seqScan shows the Error case: the inner side is a Seq Scan, meaning PostgreSQL reads the entire inner table for every outer row — O(outer × inner).

package main

import (
	"fmt"

	"github.com/bright98/pgexplain/advisor"
	"github.com/bright98/pgexplain/parser"
	"github.com/bright98/pgexplain/rules"
)

func main() {
	explainJSON := []byte(`[{
		"Plan": {
			"Node Type": "Nested Loop",
			"Parallel Aware": false,
			"Startup Cost": 0.00,
			"Total Cost": 25000.00,
			"Plan Rows": 1500,
			"Plan Width": 120,
			"Actual Startup Time": 0.021,
			"Actual Total Time": 892.341,
			"Actual Rows": 1500,
			"Actual Loops": 1,
			"Plans": [
				{
					"Node Type": "Seq Scan",
					"Parent Relationship": "Outer",
					"Parallel Aware": false,
					"Relation Name": "orders",
					"Alias": "o",
					"Startup Cost": 0.00,
					"Total Cost": 250.00,
					"Plan Rows": 1000,
					"Plan Width": 72,
					"Actual Startup Time": 0.009,
					"Actual Total Time": 2.341,
					"Actual Rows": 1000,
					"Actual Loops": 1
				},
				{
					"Node Type": "Seq Scan",
					"Parent Relationship": "Inner",
					"Parallel Aware": false,
					"Relation Name": "order_items",
					"Alias": "i",
					"Startup Cost": 0.00,
					"Total Cost": 48.00,
					"Plan Rows": 3,
					"Plan Width": 48,
					"Actual Startup Time": 0.012,
					"Actual Total Time": 1.782,
					"Actual Rows": 3,
					"Actual Loops": 1000
				}
			]
		},
		"Planning Time": 0.198,
		"Execution Time": 894.123
	}]`)

	plan, err := parser.Parse(explainJSON)
	if err != nil {
		panic(err)
	}

	adv := advisor.New(rules.NestedLoopLarge())

	for _, f := range adv.Analyze(plan) {
		fmt.Printf("[%s] %s\n", f.Severity, f.Message)
	}

}
Output:
[ERROR] nested loop performed full table scan on "order_items" 1000 times

func ParallelNotLaunched

func ParallelNotLaunched() advisor.Rule

ParallelNotLaunched returns a Rule that warns when a Gather or Gather Merge node launched fewer parallel workers than the planner intended.

What is parallel query execution?

PostgreSQL can split a query across multiple worker processes to use more CPU cores simultaneously. The plan shows this as a Gather or Gather Merge node — a synchronization point where the leader process collects results from workers.

Two fields on the Gather node indicate whether parallelism was achieved:

  • Workers Planned: how many workers the planner expected to use
  • Workers Launched: how many actually started at runtime

When Workers Launched < Workers Planned, the query ran with less parallelism than the planner assumed when building the plan. Costs and row estimates in the plan are based on Workers Planned — if fewer workers ran, those estimates no longer reflect reality and the query likely took longer than the planner predicted.

What does this rule check?

IF node is Gather or Gather Merge
AND Workers Planned > 0
AND Workers Launched < Workers Planned
→ emit Warn finding

There is no configurable threshold — any gap between planned and launched is worth flagging. The most common causes are hitting max_parallel_workers (the global cap across all queries) or max_parallel_workers_per_gather (the per-Gather cap).

When does this rule fire?

All conditions must hold:

  1. Node type is Gather or Gather Merge.
  2. Workers Planned is non-nil and greater than zero.
  3. Workers Launched is non-nil and less than Workers Planned.

Usage

rules.ParallelNotLaunched()
Example

ExampleParallelNotLaunched demonstrates the ParallelNotLaunched rule using the parallel_gather.json fixture: Workers Planned=4 but only 2 launched.

The planner built the plan assuming 4 parallel workers. At runtime only 2 started — likely because max_parallel_workers or max_parallel_workers_per_gather was too low, or other concurrent queries consumed the worker budget.

package main

import (
	"fmt"
	"os"

	"github.com/bright98/pgexplain/advisor"
	"github.com/bright98/pgexplain/parser"
	"github.com/bright98/pgexplain/rules"
)

func main() {
	explainJSON, err := os.ReadFile("../testdata/parallel_gather.json")
	if err != nil {
		panic(err)
	}

	plan, err := parser.Parse(explainJSON)
	if err != nil {
		panic(err)
	}

	adv := advisor.New(
		rules.ParallelNotLaunched(),
	)

	for _, f := range adv.Analyze(plan) {
		node, _ := plan.NodeByID(f.NodeID)
		fmt.Printf("[%s] %s\n", f.Severity, f.Message)
		fmt.Printf("  node:     %s (ID %d)\n", node.NodeType, node.ID)
		fmt.Printf("  planned:  %d\n", *node.WorkersPlanned)
		fmt.Printf("  launched: %d\n", *node.WorkersLaunched)
	}

}
Output:
[WARN] Gather launched 2 of 4 planned workers
  node:     Gather (ID 1)
  planned:  4
  launched: 2

func RowEstimateMismatch

func RowEstimateMismatch(opts ...RowEstimateMismatchOption) advisor.Rule

RowEstimateMismatch returns a Rule that warns when the planner's row estimate diverges significantly from the actual row count produced at runtime.

What is a row estimate mismatch?

PostgreSQL's planner estimates how many rows each node will produce, using statistics collected by ANALYZE. These estimates drive every planning decision: join strategy, join order, work_mem allocation, and index vs seq scan choice. When the estimate is wildly wrong, those decisions may be wrong too.

The rule fires on any node type — not just leaf scans — because mismatch at intermediate nodes (Hash Join, Sort, Aggregate) is equally harmful.

A critical detail: EXPLAIN reports Actual Rows per loop execution. Inside a Nested Loop, a node may execute thousands of times. The true total actual rows is Actual Rows × Actual Loops, and that is what this rule compares against Plan Rows.

When does this rule fire?

All three conditions must hold:

  1. EXPLAIN was run with ANALYZE (Actual Rows and Actual Loops are present).
  2. max(planRows, trueActualRows) >= minRows (default 100) — avoids noise on tiny nodes.
  3. max(planRows, trueActualRows) / min(planRows, trueActualRows) >= minEstimateFactor (default 10).

Usage

rules.RowEstimateMismatch()
rules.RowEstimateMismatch(rules.WithMinEstimateFactor(50))
rules.RowEstimateMismatch(rules.WithMinRows(500))
Example

ExampleRowEstimateMismatch demonstrates the RowEstimateMismatch rule on a Seq Scan where the planner expected 15 rows but got 12,000 — an 800x underestimate.

This kind of error typically means statistics are stale (ANALYZE hasn't run recently) or the column has a non-uniform distribution the planner can't model.

package main

import (
	"fmt"

	"github.com/bright98/pgexplain/advisor"
	"github.com/bright98/pgexplain/parser"
	"github.com/bright98/pgexplain/rules"
)

func main() {
	explainJSON := []byte(`[{
		"Plan": {
			"Node Type": "Seq Scan",
			"Parallel Aware": false,
			"Relation Name": "orders",
			"Alias": "orders",
			"Startup Cost": 0.00,
			"Total Cost": 1849.00,
			"Plan Rows": 15,
			"Plan Width": 72,
			"Actual Startup Time": 0.042,
			"Actual Total Time": 18.721,
			"Actual Rows": 12000,
			"Actual Loops": 1,
			"Filter": "(status = 'pending')"
		},
		"Planning Time": 0.123,
		"Execution Time": 18.854
	}]`)

	plan, err := parser.Parse(explainJSON)
	if err != nil {
		panic(err)
	}

	adv := advisor.New(
		rules.RowEstimateMismatch(),
	)

	for _, f := range adv.Analyze(plan) {
		fmt.Printf("[%s] %s\n", f.Severity, f.Message)
		fmt.Printf("  detail:     %s\n", f.Detail)
		fmt.Printf("  suggestion: %s\n", f.Suggestion)
	}

}
Output:
[WARN] row estimate for Seq Scan was off by 800x (underestimate: planned 15, got 12000)
  detail:     The planner estimated 15 rows but this node produced 12000 (loops=1). A 800x underestimate can cause the planner to choose the wrong join strategy, misallocate work_mem, or produce a suboptimal join order.
  suggestion: Run ANALYZE on the tables involved in this node to refresh planner statistics. If the mismatch persists, consider raising the statistics target for the relevant columns: ALTER TABLE <table> ALTER COLUMN <column> SET STATISTICS 500;
Example (WithLoops)

ExampleRowEstimateMismatch_withLoops shows that the rule correctly accounts for Actual Loops. Here an Index Scan inside a Nested Loop executes 100 times, each returning 50 rows — a true total of 5000 vs the planner's estimate of 10.

package main

import (
	"fmt"

	"github.com/bright98/pgexplain/advisor"
	"github.com/bright98/pgexplain/parser"
	"github.com/bright98/pgexplain/rules"
)

func main() {
	explainJSON := []byte(`[{
		"Plan": {
			"Node Type": "Nested Loop",
			"Parallel Aware": false,
			"Startup Cost": 0.42,
			"Total Cost": 9876.00,
			"Plan Rows": 1000,
			"Plan Width": 80,
			"Actual Startup Time": 0.031,
			"Actual Total Time": 120.4,
			"Actual Rows": 5000,
			"Actual Loops": 1,
			"Plans": [
				{
					"Node Type": "Seq Scan",
					"Parent Relationship": "Outer",
					"Parallel Aware": false,
					"Relation Name": "customers",
					"Alias": "c",
					"Startup Cost": 0.00,
					"Total Cost": 25.00,
					"Plan Rows": 100,
					"Plan Width": 36,
					"Actual Startup Time": 0.009,
					"Actual Total Time": 0.321,
					"Actual Rows": 100,
					"Actual Loops": 1
				},
				{
					"Node Type": "Index Scan",
					"Parent Relationship": "Inner",
					"Parallel Aware": false,
					"Relation Name": "orders",
					"Alias": "o",
					"Index Name": "orders_customer_id_idx",
					"Startup Cost": 0.42,
					"Total Cost": 12.47,
					"Plan Rows": 10,
					"Plan Width": 72,
					"Actual Startup Time": 0.031,
					"Actual Total Time": 0.038,
					"Actual Rows": 50,
					"Actual Loops": 100
				}
			]
		},
		"Planning Time": 0.201,
		"Execution Time": 122.5
	}]`)

	plan, err := parser.Parse(explainJSON)
	if err != nil {
		panic(err)
	}

	adv := advisor.New(
		rules.RowEstimateMismatch(),
	)

	for _, f := range adv.Analyze(plan) {
		fmt.Printf("[%s] %s\n", f.Severity, f.Message)
	}

}
Output:
[WARN] row estimate for Index Scan was off by 500x (underestimate: planned 10, got 5000)

func SeqScan

func SeqScan(opts ...SeqScanOption) advisor.Rule

SeqScan returns a Rule that warns when a sequential scan filters out many more rows than it returns — a strong signal that an index on the filtered column(s) would significantly reduce I/O.

What is a Sequential Scan?

A Seq Scan reads every row in the table in physical order and evaluates the WHERE clause on each one. It is the simplest access method and is correct for small tables or queries that need most of the table. But when a filter rejects the vast majority of rows, the scan does far more work than necessary — work that a B-tree index would skip entirely.

When does this rule fire?

The rule fires when all three conditions hold:

  1. The node is a Seq Scan with a Filter condition (no filter = intentional full read).
  2. rows_removed_by_filter / actual_rows >= minFilterRatio (default 10).
  3. Special case: if actual_rows is 0 (nothing matched), fires when rows_removed_by_filter >= 1000, since division is undefined.

Usage

rules.SeqScan()                            // default: ratio >= 10
rules.SeqScan(rules.WithMinFilterRatio(5)) // stricter: ratio >= 5
Example

ExampleSeqScan demonstrates end-to-end usage of the SeqScan rule.

The plan below represents a query like:

SELECT * FROM orders WHERE customer_id = 42

PostgreSQL chose a Seq Scan and read all 100,000 rows in the table, but only 12 matched the filter — a ratio of 8332:1. The rule flags this as a candidate for an index on orders(customer_id).

package main

import (
	"fmt"

	"github.com/bright98/pgexplain/advisor"
	"github.com/bright98/pgexplain/parser"
	"github.com/bright98/pgexplain/rules"
)

func main() {
	explainJSON := []byte(`[{
		"Plan": {
			"Node Type": "Seq Scan",
			"Parallel Aware": false,
			"Relation Name": "orders",
			"Alias": "orders",
			"Startup Cost": 0.00,
			"Total Cost": 1849.00,
			"Plan Rows": 15,
			"Plan Width": 72,
			"Actual Startup Time": 0.042,
			"Actual Total Time": 18.721,
			"Actual Rows": 12,
			"Actual Loops": 1,
			"Filter": "(customer_id = 42)",
			"Rows Removed by Filter": 99988
		},
		"Planning Time": 0.123,
		"Execution Time": 18.854
	}]`)

	plan, err := parser.Parse(explainJSON)
	if err != nil {
		panic(err)
	}

	adv := advisor.New(
		rules.SeqScan(), // default: warn when ratio >= 10
	)

	for _, f := range adv.Analyze(plan) {
		node, _ := plan.NodeByID(f.NodeID)
		fmt.Printf("[%s] %s\n", f.Severity, f.Message)
		fmt.Printf("  node:       %s on %q\n", node.NodeType, *node.RelationName)
		fmt.Printf("  detail:     %s\n", f.Detail)
		fmt.Printf("  suggestion: %s\n", f.Suggestion)
	}

}
Output:
[WARN] sequential scan on "orders" discards 8332x more rows than it returns
  node:       Seq Scan on "orders"
  detail:     PostgreSQL read 100000 rows from "orders" but only 12 matched (customer_id = 42) (8332 rows discarded per row returned). An index on the filtered column(s) would allow PostgreSQL to skip non-matching rows.
  suggestion: Add an index on "orders" to support the filter (customer_id = 42). Run EXPLAIN (ANALYZE, BUFFERS) after adding the index to confirm it is used.
Example (WithCustomThreshold)

ExampleSeqScan_withCustomThreshold shows how to lower the filter ratio threshold to catch less severe (but still wasteful) sequential scans.

package main

import (
	"fmt"

	"github.com/bright98/pgexplain/advisor"
	"github.com/bright98/pgexplain/parser"
	"github.com/bright98/pgexplain/rules"
)

func main() {
	explainJSON := []byte(`[{
		"Plan": {
			"Node Type": "Seq Scan",
			"Parallel Aware": false,
			"Relation Name": "events",
			"Alias": "events",
			"Startup Cost": 0.00,
			"Total Cost": 540.00,
			"Plan Rows": 100,
			"Plan Width": 48,
			"Actual Startup Time": 0.011,
			"Actual Total Time": 4.201,
			"Actual Rows": 100,
			"Actual Loops": 1,
			"Filter": "(type = 'click')",
			"Rows Removed by Filter": 700
		},
		"Planning Time": 0.055,
		"Execution Time": 4.250
	}]`)

	plan, err := parser.Parse(explainJSON)
	if err != nil {
		panic(err)
	}

	// Default threshold (10x) would stay silent here — ratio is only 7x.
	// Lowering to 5x catches it.
	adv := advisor.New(
		rules.SeqScan(rules.WithMinFilterRatio(5)),
	)

	for _, f := range adv.Analyze(plan) {
		fmt.Printf("[%s] %s\n", f.Severity, f.Message)
	}

}
Output:
[WARN] sequential scan on "events" discards 7x more rows than it returns

func SortSpill

func SortSpill() advisor.Rule

SortSpill returns a Rule that warns when a Sort node spills to disk.

What is a Sort spill?

When PostgreSQL sorts rows (for ORDER BY, GROUP BY, or window functions), it allocates memory from work_mem. If the input fits, the sort runs entirely in memory using quicksort or top-N heapsort. If the input exceeds work_mem, PostgreSQL falls back to an external merge sort: it sorts chunks in memory, writes them to temporary disk files, then reads and merges the chunks.

This is indicated by Sort Method "external merge" and Sort Space Type "Disk" in EXPLAIN output. Any disk spill means the sort is doing avoidable I/O.

What does this rule check?

IF node is a Sort node
AND Sort Space Type == "Disk"   (equivalently: Sort Method == "external merge")
→ emit Warn finding

There is no configurable threshold — any disk spill is worth flagging. The finding includes Sort Space Used (kB) so the caller has a concrete work_mem target.

When does this rule fire?

All conditions must hold:

  1. Node type is Sort.
  2. SortSpaceType is non-nil and equals "Disk".

Usage

rules.SortSpill()
Example

ExampleSortSpill demonstrates the SortSpill rule on a plan where a Sort node exceeded work_mem and wrote temporary data to disk.

PostgreSQL reports "Sort Method: external merge" and "Sort Space Type: Disk" when this happens. The rule fires on the Sort node and suggests a work_mem target calculated from the reported disk usage.

package main

import (
	"fmt"

	"github.com/bright98/pgexplain/advisor"
	"github.com/bright98/pgexplain/parser"
	"github.com/bright98/pgexplain/rules"
)

func main() {
	explainJSON := []byte(`[{
		"Plan": {
			"Node Type": "Sort",
			"Parallel Aware": false,
			"Startup Cost": 15420.44,
			"Total Cost": 17920.44,
			"Plan Rows": 100000,
			"Plan Width": 72,
			"Actual Startup Time": 892.341,
			"Actual Total Time": 1203.892,
			"Actual Rows": 100000,
			"Actual Loops": 1,
			"Sort Key": ["created_at DESC"],
			"Sort Method": "external merge",
			"Sort Space Used": 18432,
			"Sort Space Type": "Disk",
			"Plans": [{
				"Node Type": "Seq Scan",
				"Parent Relationship": "Outer",
				"Parallel Aware": false,
				"Relation Name": "events",
				"Alias": "events",
				"Startup Cost": 0.00,
				"Total Cost": 8681.00,
				"Plan Rows": 100000,
				"Plan Width": 72,
				"Actual Startup Time": 0.012,
				"Actual Total Time": 98.234,
				"Actual Rows": 100000,
				"Actual Loops": 1
			}]
		},
		"Planning Time": 0.145,
		"Execution Time": 1204.123
	}]`)

	plan, err := parser.Parse(explainJSON)
	if err != nil {
		panic(err)
	}

	adv := advisor.New(
		rules.SortSpill(),
	)

	for _, f := range adv.Analyze(plan) {
		fmt.Printf("[%s] %s\n", f.Severity, f.Message)
		fmt.Printf("  detail:     %s\n", f.Detail)
	}

}
Output:
[WARN] sort spilled to disk using external merge (18432 kB)
  detail:     The sort could not fit in work_mem and wrote temporary data to disk. PostgreSQL sorted chunks in memory, wrote them to temp files, then merged the files — reading and writing the sorted data at least twice. Disk usage for this sort: 18432 kB. An in-memory sort (quicksort) is significantly faster because it avoids all disk I/O.

func TopNHeapsort

func TopNHeapsort(opts ...TopNHeapsortOption) advisor.Rule

TopNHeapsort returns a Rule that flags Sort nodes that use top-N heapsort over a full Seq Scan when an index on the sort key could eliminate the scan.

What is top-N heapsort?

When a query has ORDER BY ... LIMIT N, PostgreSQL can avoid sorting all rows by maintaining a fixed-size heap of the N best rows seen so far. This is the "top-N heapsort" strategy: it is always in-memory and faster than a full external merge sort.

However, top-N heapsort still reads every row of the input to determine which N rows win. If the child node is a Seq Scan on a large table, the query reads the entire table just to return a handful of rows.

A B-tree index on the sort key column(s) would allow PostgreSQL to use an Index Scan in the correct order and stop after N rows — O(N) instead of O(table size).

What does this rule check?

IF node is Sort
AND Sort Method == "top-N heapsort"
AND child node is a Seq Scan
AND child rows scanned (ActualRows × ActualLoops) >= minInputRows   (default: 1000)
→ emit Info finding

Severity is Info (not Warn) because:

  • top-N heapsort is in-memory and fast — it is not a confirmed problem
  • we cannot know from the plan whether an index exists; it may exist but the planner chose not to use it (e.g. stale statistics)
  • the finding is a hint to investigate, not a definitive diagnosis

When does this rule fire?

All conditions must hold:

  1. Node type is Sort.
  2. Sort Method is "top-N heapsort".
  3. The child node is a Seq Scan (no index was used at all).
  4. ANALYZE was run (ActualRows and ActualLoops present on the child).
  5. child.ActualRows × child.ActualLoops >= minInputRows (default 1000).

Usage

rules.TopNHeapsort()
rules.TopNHeapsort(rules.WithMinInputRows(10000))
Example

ExampleTopNHeapsort demonstrates the TopNHeapsort rule on a query that uses ORDER BY ... LIMIT 10 over a table of 100,000 rows.

PostgreSQL chose top-N heapsort: it read all 100,000 rows and kept only the top 10 in a heap. An index on (created_at DESC) would allow an Index Scan to stop after 10 rows, making the query O(LIMIT) instead of O(table size).

package main

import (
	"fmt"

	"github.com/bright98/pgexplain/advisor"
	"github.com/bright98/pgexplain/parser"
	"github.com/bright98/pgexplain/rules"
)

func main() {
	explainJSON := []byte(`[{
		"Plan": {
			"Node Type": "Sort",
			"Parallel Aware": false,
			"Startup Cost": 0.00,
			"Total Cost": 2891.34,
			"Plan Rows": 10,
			"Plan Width": 72,
			"Actual Startup Time": 312.451,
			"Actual Total Time": 312.453,
			"Actual Rows": 10,
			"Actual Loops": 1,
			"Sort Key": ["created_at DESC"],
			"Sort Method": "top-N heapsort",
			"Sort Space Used": 25,
			"Sort Space Type": "Memory",
			"Plans": [{
				"Node Type": "Seq Scan",
				"Parent Relationship": "Outer",
				"Parallel Aware": false,
				"Relation Name": "orders",
				"Alias": "orders",
				"Startup Cost": 0.00,
				"Total Cost": 2846.00,
				"Plan Rows": 100000,
				"Plan Width": 72,
				"Actual Startup Time": 0.012,
				"Actual Total Time": 298.123,
				"Actual Rows": 100000,
				"Actual Loops": 1
			}]
		},
		"Planning Time": 0.145,
		"Execution Time": 312.521
	}]`)

	plan, err := parser.Parse(explainJSON)
	if err != nil {
		panic(err)
	}

	adv := advisor.New(
		rules.TopNHeapsort(), // default: flag when child Seq Scan reads >= 1000 rows
	)

	for _, f := range adv.Analyze(plan) {
		fmt.Printf("[%s] %s\n", f.Severity, f.Message)
		fmt.Printf("  detail:     %s\n", f.Detail)
	}

}
Output:
[INFO] top-N heapsort on "orders" scanned 100000 rows to return 10
  detail:     The query used top-N heapsort to return 10 rows from "orders". This strategy reads every row of the input (100000 rows scanned) and keeps only the top N in a fixed-size heap. It is in-memory and fast, but still performs a full table scan. If a B-tree index exists on (created_at DESC), PostgreSQL could use an Index Scan to read rows in sorted order and stop after the LIMIT — reducing the scan from 100000 rows to just the rows returned.

Types

type IndexOnlyScanOption

type IndexOnlyScanOption func(*indexOnlyScanRule)

IndexOnlyScanOption configures the MissingIndexOnlyScan rule.

func WithMinHeapFetchRatio

func WithMinHeapFetchRatio(ratio float64) IndexOnlyScanOption

WithMinHeapFetchRatio sets the minimum fraction of rows that must hit the heap before a finding is emitted. The ratio is:

HeapFetches / (ActualRows × ActualLoops)

Default: 0.1 (10% of rows hitting the heap triggers a Warn).

type NestedLoopOption

type NestedLoopOption func(*nestedLoopRule)

NestedLoopOption configures the NestedLoopLarge rule.

func WithMinInnerLoops

func WithMinInnerLoops(n int) NestedLoopOption

WithMinInnerLoops sets the minimum number of inner-side executions that triggers a finding. Each execution corresponds to one outer row probing the inner side.

Default: 1000.

type RowEstimateMismatchOption

type RowEstimateMismatchOption func(*rowEstimateMismatchRule)

RowEstimateMismatchOption configures the RowEstimateMismatch rule.

func WithMinEstimateFactor

func WithMinEstimateFactor(factor float64) RowEstimateMismatchOption

WithMinEstimateFactor sets the minimum error factor that triggers a warning. The factor is computed as max(planRows, actualRows) / min(planRows, actualRows), so it is always >= 1 and symmetric: a 10x underestimate and a 10x overestimate are treated the same way.

Default: 10.

func WithMinRows

func WithMinRows(rows float64) RowEstimateMismatchOption

WithMinRows sets the minimum row count floor. The rule is skipped when the larger of planRows and trueActualRows is below this value. This prevents noise from tiny nodes where a 10x error on 3 rows causes no real harm.

Default: 100.

type SeqScanOption

type SeqScanOption func(*seqScanRule)

SeqScanOption configures the SeqScan rule.

func WithMinFilterRatio

func WithMinFilterRatio(ratio float64) SeqScanOption

WithMinFilterRatio sets the minimum ratio of rows discarded to rows returned that triggers a warning. A ratio of 10 means the scan throws away at least 10 rows for every 1 row it returns.

Lower values catch more marginal cases; higher values reduce noise. Default: 10.

type TopNHeapsortOption

type TopNHeapsortOption func(*topNHeapsortRule)

TopNHeapsortOption configures the TopNHeapsort rule.

func WithMinInputRows

func WithMinInputRows(n int) TopNHeapsortOption

WithMinInputRows sets the minimum number of rows the child Seq Scan must read before a finding is emitted. This avoids noise on small tables where the full scan is cheap regardless.

Default: 1000.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL