test-html-selector

command
v0.0.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 20, 2025 License: MIT Imports: 22 Imported by: 0

README

HTML Selector Testing Tool

A command-line tool for testing CSS and XPath selectors against HTML documents. It provides match counts and contextual examples to verify selector accuracy.

Features

  • Support for both CSS and XPath selectors
  • Process multiple files and URLs in a single run
  • Configurable sample count and context size
  • YAML configuration for selectors
  • DOM path visualization for matched elements
  • Parent context for each match
  • Extract and print all matches for each selector
  • HTML simplification options for cleaner output
  • Template-based output formatting

Installation

go install ./cmd/tools/test-html-selector

Usage

  1. Create a YAML configuration file:
description: |
  Description of what these selectors are trying to match
selectors:
  - name: product_titles
    selector: .product-card h2
    type: css
    description: Extracts product titles from cards
  - name: prices
    selector: //div[@class='price']
    type: xpath
    description: Extracts price elements
config:
  sample_count: 5
  context_chars: 100
  template: |  # Optional Go template for formatting output
    {{- range . }}
    # Results from {{ .Source }}
    {{- range $selector, $matches := .Data }}
    ## {{ $selector }}
    {{- range $matches }}
    - {{ . }}
    {{- end }}
    {{- end }}
    {{- end }}
  1. Run the tool:
# Basic usage with config file and multiple sources
test-html-selector --config config.yaml --files file1.html file2.html

# Process multiple URLs
test-html-selector --urls https://example.com https://example.org \
  --select-css ".product-card h2" \
  --select-xpath "//div[@class='price']"

# Extract all matches with template formatting
test-html-selector --config config.yaml \
  --files file1.html file2.html \
  --urls https://example.com \
  --extract --extract-template template.tmpl

# Show context and customize output
test-html-selector --config config.yaml \
  --files input1.html input2.html \
  --show-context --sample-count 10 --context-chars 200

Configuration Options

Command Line Flags
Basic Options
  • --config: Path to YAML config file
  • --files: HTML files to process (can be specified multiple times)
  • --urls: URLs to fetch and process (can be specified multiple times)
  • --select-css: CSS selectors to test (can be specified multiple times)
  • --select-xpath: XPath selectors to test (can be specified multiple times)
  • --extract: Extract all matches into a YAML map of selector name to matches (ignores sample-count limit)
  • --extract-data: Extract raw data without applying templates
  • --extract-template: Go template file to render with extracted data
  • --show-context: Show context around matched elements (default: false)
  • --show-path: Show path to matched elements (default: true)
  • --sample-count: Maximum number of examples to show in normal mode (default: 3)
  • --context-chars: Number of characters of context to include (default: 100)
HTML Simplification Options
  • --strip-scripts: Remove

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL