contents

package
v0.0.0-...-11da2c6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 10, 2025 License: AGPL-3.0 Imports: 16 Imported by: 0

Documentation

Overview

Package contents provide extraction processes for content processing (readability) and plain text conversion.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ConvertMathBlocks

func ConvertMathBlocks(m *extract.ProcessMessage, next extract.Processor) extract.Processor

ConvertMathBlocks converts MathJax (2.7 and 3+) and katex-html to MathML.

func EnableReadability

func EnableReadability(ctx context.Context, v bool) context.Context

EnableReadability enables or disable readability in the extractor context.

func ExtractInlineSVGs

func ExtractInlineSVGs(m *extract.ProcessMessage, next extract.Processor) extract.Processor

ExtractInlineSVGs is a processor that converts inline SVG to cached resources. Each SVG node is saved in the resource cache with a known URL, then the node is replaced by an img tag linking to this resource.

func IsIcon

func IsIcon(node *html.Node, w, h, maxSize int) bool

IsIcon returns true when an image has a ratio >= 0.9, its biggest dimension is greater or equal than "maxSize" and it doesn't have any text or inline sibling on both sides.

func IsReadabilityEnabled

func IsReadabilityEnabled(ctx context.Context) (enabled bool, forced bool)

IsReadabilityEnabled returns true when readability is enabled in the extractor context.

func Readability

func Readability(options ...func(*readability.Parser)) extract.Processor

Readability is a processor that executes readability on the drop content.

func StripHeadingAnchors

func StripHeadingAnchors(m *extract.ProcessMessage, next extract.Processor) extract.Processor

StripHeadingAnchors removes self-linking heading hyperlinks that are auto-generated by some content publishing platforms. The contents of the links are unwrapped, except when they contain a single character or icon, in which the link is completely removed.

func Text

Text is a processor that sets the pure text content of the final HTML.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL