extract

package

v0.2.1 Latest Latest Go to latest Published: Jun 17, 2026 License: MIT Imports: 6 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/tamnd/yomi

Links

Open Source Insights

Documentation ¶

Overview ¶

Package extract turns a full HTML page into its article: the main-content node with the chrome removed, plus the page metadata (title, byline, site name, excerpt, language, publish date) and the outbound links.

It runs go-readability for the content node and harvests metadata from the document's own tags first, falling back to what readability recovers. The content node is sanitised with kage's CleanTree so no script or handler survives into the Markdown.

Index ¶

type Article
- func FromHTML(body []byte, pageURL string) (*Article, error)
type Link

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Article ¶

type Article struct {
	Title     string
	Byline    string
	SiteName  string
	Excerpt   string
	Lang      string
	Published string
	// Node is the main-content subtree, sanitised and ready for conversion. It is
	// nil when readability found no article.
	Node *html.Node
	// Links are every outbound hyperlink in the whole document.
	Links []Link
	// LowConfidence is true when readability could not isolate a clear article and
	// yomi fell back to a coarse selection.
	LowConfidence bool
}

Article is the extracted form of one HTML page.

func FromHTML ¶

func FromHTML(body []byte, pageURL string) (*Article, error)

FromHTML parses an HTML body and extracts its Article. pageURL is the absolute URL of the page, used to resolve relative links and to guide readability.

type Link ¶

type Link struct {
	Text string
	URL  string
}

Link is one outbound hyperlink discovered on a page, resolved to an absolute URL.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL