htmltotext

package
v0.7.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 19, 2026 License: MIT Imports: 4 Imported by: 0

Documentation

Overview

Package htmltotext converts HTML content to plain text with proper formatting.

Overview

This package provides HTML-to-plain-text conversion that:

  • Decodes all HTML entities to their Unicode equivalents
  • Converts hyperlinks to footnote-style references (Lynx/Pandoc convention)
  • Strips all HTML tags while preserving meaningful whitespace
  • Preserves block-level structure (paragraphs, headings, lists)

Links are converted to footnote-style references following the Lynx/Pandoc convention. Each unique URL gets a sequential reference number:

Input:  <a href="https://go.dev">Go</a> is great. See <a href="https://go.dev/doc">docs</a>.
Output: Go [1] is great. See docs [2].

        References:
        [1]: https://go.dev
        [2]: https://go.dev/doc

When the link text matches the URL (bare links), no footnote is added:

Input:  Visit <a href="https://go.dev">https://go.dev</a>
Output: Visit https://go.dev

Duplicate URLs reuse the same reference number.

Usage

text := htmltotext.Convert("<p>Hello &amp; <a href=\"https://go.dev\">Go</a></p>")
// Returns: "Hello & Go [1]\n\nReferences:\n[1]: https://go.dev"

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Convert

func Convert(htmlContent string) string

Convert transforms HTML content into plain text with footnote-style link references. It decodes HTML entities, strips tags while preserving block structure, and appends a references section for any hyperlinks found.

Links where the visible text matches the URL are rendered inline without a footnote reference. Duplicate URLs share the same reference number.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL