README
¶
HTML Selector Testing Tool
A command-line tool for testing CSS and XPath selectors against HTML documents. It provides match counts and contextual examples to verify selector accuracy.
Features
- Support for both CSS and XPath selectors
- Process multiple files and URLs in a single run
- Configurable sample count and context size
- YAML configuration for selectors
- DOM path visualization for matched elements
- Parent context for each match
- Extract and print all matches for each selector
- HTML simplification options for cleaner output
- Template-based output formatting
Installation
go install ./cmd/tools/test-html-selector
Usage
- Create a YAML configuration file:
description: |
Description of what these selectors are trying to match
selectors:
- name: product_titles
selector: .product-card h2
type: css
description: Extracts product titles from cards
- name: prices
selector: //div[@class='price']
type: xpath
description: Extracts price elements
config:
sample_count: 5
context_chars: 100
template: | # Optional Go template for formatting output
{{- range . }}
# Results from {{ .Source }}
{{- range $selector, $matches := .Data }}
## {{ $selector }}
{{- range $matches }}
- {{ . }}
{{- end }}
{{- end }}
{{- end }}
- Run the tool:
# Basic usage with config file and multiple sources
test-html-selector --config config.yaml --files file1.html file2.html
# Process multiple URLs
test-html-selector --urls https://example.com https://example.org \
--select-css ".product-card h2" \
--select-xpath "//div[@class='price']"
# Extract all matches with template formatting
test-html-selector --config config.yaml \
--files file1.html file2.html \
--urls https://example.com \
--extract --extract-template template.tmpl
# Show context and customize output
test-html-selector --config config.yaml \
--files input1.html input2.html \
--show-context --sample-count 10 --context-chars 200
Configuration Options
Command Line Flags
Basic Options
--config
: Path to YAML config file--files
: HTML files to process (can be specified multiple times)--urls
: URLs to fetch and process (can be specified multiple times)--select-css
: CSS selectors to test (can be specified multiple times)--select-xpath
: XPath selectors to test (can be specified multiple times)--extract
: Extract all matches into a YAML map of selector name to matches (ignores sample-count limit)--extract-data
: Extract raw data without applying templates--extract-template
: Go template file to render with extracted data--show-context
: Show context around matched elements (default: false)--show-path
: Show path to matched elements (default: true)--sample-count
: Maximum number of examples to show in normal mode (default: 3)--context-chars
: Number of characters of context to include (default: 100)
HTML Simplification Options
--strip-scripts
: Remove
Documentation
¶
There is no documentation for this package.
Click to show internal directories.
Click to hide internal directories.