extractor

package
v0.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 16, 2026 License: MIT Imports: 14 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func FormatArticle

func FormatArticle(article *Article, includeImages bool) (string, error)

FormatArticle formats the extracted article as markdown with metadata

func StripImages

func StripImages(markdown string) string

StripImages removes all markdown image references from content

func ToMarkdown

func ToMarkdown(html string) (string, error)

ToMarkdown converts HTML content to GitHub Flavored Markdown

Types

type Article

type Article struct {
	Title   string
	Byline  string
	Content string // HTML content
	URL     string
}

Article contains extracted article content

func ExtractArticle

func ExtractArticle(html string, pageURL string) (*Article, error)

ExtractArticle uses Readability to extract the main content from HTML

type PageResult

type PageResult struct {
	HTML string
	URL  string // Final URL after redirects
}

PageResult contains the fetched page data

func FetchPage

func FetchPage(url string, timeout time.Duration) (*PageResult, error)

FetchPage loads a URL using Chrome with the user's profile for authentication

func FetchPageHTTP

func FetchPageHTTP(url string, timeout time.Duration) (*PageResult, error)

FetchPageHTTP fetches a page using simple HTTP (no JavaScript execution)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL