striphtml

package module
v0.0.0-...-a2d268c Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 5, 2023 License: MIT Imports: 9 Imported by: 0

README

striphtml

Fork of jaytaylor/html2text with changes to add more extensive CLI and HTTP server built around it.

Introduction

Ensure your emails are readable by all!

Turns HTML into raw text, useful for sending fancy HTML emails with an equivalently nicely formatted TXT document as a fallback.

striphtml is a simple golang cli/server for rendering HTML into plaintext.

It requires go 1.21 or newer.

Download the package

go get github.com/JayJamieson/striphtml

Example usage

striphtml can be run as a standalone command or HTTP Server.

Standalone:
$ cat index.html | striphtml

$ striphtml < index.html

HTTP Server:
$ striphtml serve -p 8080

Strip html from a provided url:
$ curl -X GET http://localhost:8080/strip?url=https://www.google.com

Send html directly:
$ curl -X POST -H 'Content-Type: text/html' http://localhost:8080/strip -d '<div>Hello world!</div>'
Library
package main

import (
 "fmt"

 "github.com/JayJamieson/striphtml"
)

func main() {
 inputHTML := `
<html>
  <head>
    <title>My Mega Service</title>
    <link rel=\"stylesheet\" href=\"main.css\">
    <style type=\"text/css\">body { color: #fff; }</style>
  </head>

  <body>
    <div class="logo">
      <a href="http://example.com/"><img src="/logo-image.jpg" alt="Mega Service"/></a>
    </div>

    <h1>Welcome to your new account on my service!</h1>

    <p>
      Here is some more information:

      <ul>
        <li>Link 1: <a href="https://example.com">Example.com</a></li>
        <li>Link 2: <a href="https://example2.com">Example2.com</a></li>
        <li>Something else</li>
      </ul>
    </p>

    <table>
      <thead>
        <tr><th>Header 1</th><th>Header 2</th></tr>
      </thead>
      <tfoot>
        <tr><td>Footer 1</td><td>Footer 2</td></tr>
      </tfoot>
      <tbody>
        <tr><td>Row 1 Col 1</td><td>Row 1 Col 2</td></tr>
        <tr><td>Row 2 Col 1</td><td>Row 2 Col 2</td></tr>
      </tbody>
    </table>
  </body>
</html>`

 text, err := striphtml.FromString(inputHTML, striphtml.Options{PrettyTables: true})
 if err != nil {
  panic(err)
 }
 fmt.Println(text)
}

Output:

Mega Service ( http://example.com/ )

******************************************
Welcome to your new account on my service!
******************************************

Here is some more information:

* Link 1: Example.com ( https://example.com )
* Link 2: Example2.com ( https://example2.com )
* Something else

+-------------+-------------+
|  HEADER 1   |  HEADER 2   |
+-------------+-------------+
| Row 1 Col 1 | Row 1 Col 2 |
| Row 2 Col 1 | Row 2 Col 2 |
+-------------+-------------+
|  FOOTER 1   |  FOOTER 2   |
+-------------+-------------+
Command line

Read HTML from stdin and write plain text to stdout.

echo '<div>hi</div>' | striphtml

As HTTP server.

striphtml server

Unit-tests

Running the unit-tests is straightforward and standard:

go test

License

Permissive MIT license.

Alternatives

Documentation

Overview

Example
inputHTML := `
<html>
	<head>
		<title>My Mega Service</title>
		<link rel=\"stylesheet\" href=\"main.css\">
		<style type=\"text/css\">body { color: #fff; }</style>
	</head>

	<body>
		<div class="logo">
			<a href="http://jaytaylor.com/"><img src="/logo-image.jpg" alt="Mega Service"/></a>
		</div>

		<h1>Welcome to your new account on my service!</h1>

		<p>
			Here is some more information:

			<ul>
				<li>Link 1: <a href="https://example.com">Example.com</a></li>
				<li>Link 2: <a href="https://example2.com">Example2.com</a></li>
				<li>Something else</li>
			</ul>
		</p>

		<table>
			<thead>
				<tr><th>Header 1</th><th>Header 2</th></tr>
			</thead>
			<tfoot>
				<tr><td>Footer 1</td><td>Footer 2</td></tr>
			</tfoot>
			<tbody>
				<tr><td>Row 1 Col 1</td><td>Row 1 Col 2</td></tr>
				<tr><td>Row 2 Col 1</td><td>Row 2 Col 2</td></tr>
			</tbody>
		</table>
	</body>
</html>`

text, err := FromString(inputHTML, Options{PrettyTables: true})
if err != nil {
	panic(err)
}
fmt.Println(text)
Output:

Mega Service ( http://jaytaylor.com/ )

******************************************
Welcome to your new account on my service!
******************************************

Here is some more information:

* Link 1: Example.com ( https://example.com )
* Link 2: Example2.com ( https://example2.com )
* Something else

+-------------+-------------+
|  HEADER 1   |  HEADER 2   |
+-------------+-------------+
| Row 1 Col 1 | Row 1 Col 2 |
| Row 2 Col 1 | Row 2 Col 2 |
+-------------+-------------+
|  FOOTER 1   |  FOOTER 2   |
+-------------+-------------+

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func FromHTMLNode

func FromHTMLNode(doc *html.Node, o ...Options) (string, error)

FromHTMLNode renders text output from a pre-parsed HTML document.

func FromReader

func FromReader(reader io.Reader, options ...Options) (string, error)

FromReader renders text output after parsing HTML for the specified io.Reader.

func FromString

func FromString(input string, options ...Options) (string, error)

FromString parses HTML from the input string, then renders the text form.

func GetElementByID

func GetElementByID(n *html.Node, id string) *html.Node

Types

type Options

type Options struct {
	PrettyTables        bool                 // Turns on pretty ASCII rendering for table elements.
	PrettyTablesOptions *PrettyTablesOptions // Configures pretty ASCII rendering for table elements.
	OmitLinks           bool                 // Turns on omitting links
	TextOnly            bool                 // Returns only plain text
	StripByID           bool                 // Turns on getElementByID and returns stripped child elements of ID attribute
	ElementID           string
}

Options provide toggles and overrides to control specific rendering behaviors.

type PrettyTablesOptions

type PrettyTablesOptions struct {
	AutoFormatHeader     bool
	AutoWrapText         bool
	ReflowDuringAutoWrap bool
	ColWidth             int
	ColumnSeparator      string
	RowSeparator         string
	CenterSeparator      string
	HeaderAlignment      int
	FooterAlignment      int
	Alignment            int
	ColumnAlignment      []int
	NewLine              string
	HeaderLine           bool
	RowLine              bool
	AutoMergeCells       bool
	Borders              tablewriter.Border
}

PrettyTablesOptions overrides tablewriter behaviors

func NewPrettyTablesOptions

func NewPrettyTablesOptions() *PrettyTablesOptions

NewPrettyTablesOptions creates PrettyTablesOptions with default settings

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL