Documentation
¶
Overview ¶
Package contents provide extraction processes for content processing (readability) and plain text conversion.
Index ¶
- func ConvertMathBlocks(m *extract.ProcessMessage, next extract.Processor) extract.Processor
- func EnableReadability(ctx context.Context, v bool) context.Context
- func ExtractInlineSVGs(m *extract.ProcessMessage, next extract.Processor) extract.Processor
- func IsIcon(node *html.Node, w, h, maxSize int) bool
- func IsReadabilityEnabled(ctx context.Context) (enabled bool, forced bool)
- func Readability(options ...func(*readability.Parser)) extract.Processor
- func StripHeadingAnchors(m *extract.ProcessMessage, next extract.Processor) extract.Processor
- func Text(m *extract.ProcessMessage, next extract.Processor) extract.Processor
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ConvertMathBlocks ¶
ConvertMathBlocks converts MathJax (2.7 and 3+) and katex-html to MathML.
func EnableReadability ¶
EnableReadability enables or disable readability in the extractor context.
func ExtractInlineSVGs ¶
ExtractInlineSVGs is a processor that converts inline SVG to cached resources. Each SVG node is saved in the resource cache with a known URL, then the node is replaced by an img tag linking to this resource.
func IsIcon ¶
IsIcon returns true when an image has a ratio >= 0.9, its biggest dimension is greater or equal than "maxSize" and it doesn't have any text or inline sibling on both sides.
func IsReadabilityEnabled ¶
IsReadabilityEnabled returns true when readability is enabled in the extractor context.
func Readability ¶
Readability is a processor that executes readability on the drop content.
func StripHeadingAnchors ¶
StripHeadingAnchors removes self-linking heading hyperlinks that are auto-generated by some content publishing platforms. The contents of the links are unwrapped, except when they contain a single character or icon, in which the link is completely removed.
Types ¶
This section is empty.