htmldistill

command module

v0.0.0-...-47096ee Latest Latest Go to latest Published: Jun 10, 2025 License: MIT Imports: 10 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/tmc/misc

Links

Open Source Insights

Documentation ¶

Overview ¶

htmldistill is a command-line tool that extracts and distills the main content from HTML documents. It processes input from URLs, files, or standard input, removing clutter such as navigation, ads, and other non-essential elements.

Usage:

htmldistill <url1> [url2] [url3] ...

htmldistill accepts one or more URLs as arguments. For each URL, it fetches the content, processes it using the go-domdistiller library, and outputs the extracted main content as HTML to stdout.

The tool can also process local files or input from stdin by using '-' as an argument. When reading from stdin, an optional base URL can be provided to resolve relative links.

htmldistill is useful for cleaning up web content for further processing, improving readability, or preparing data for natural language processing tasks.

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL