util

package
v0.0.0-...-25a16ef Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 8, 2022 License: MIT Imports: 16 Imported by: 0

Documentation

Overview

Package util implements functions useful when parsing and traversing HTML generated by Medium's export tool.

Index

Constants

This section is empty.

Variables

View Source
var DL_QUEUE map[string]bool
View Source
var (
	ErrArchiveRootNotFound = errors.New("meh: archive root not found")
)
View Source
var SPACE_RE *regexp.Regexp = regexp.MustCompile(`\s+`)

Functions

func DownloadImage

func DownloadImage(img, dest string) error

DownloadImage downloads an image from Medium CDN and saves it into a directory specified by dest.

func FindArchiveRoot

func FindArchiveRoot(dir string) (string, error)

FindArchiveRoot attempts to find where the actual archive starts by looking for a README.html file. It only looks max one level deep.

func GenerateReceiptNumber

func GenerateReceiptNumber() string

GenerateReceiptNumber returns a pseudo-random string of letters and digits.

func GetQueuedImages

func GetQueuedImages() (q []string)

GetQueuedImages returns a slice of image IDs that need to be downloaded

func ParseMediumId

func ParseMediumId(s string) string

ParseMediumId Parses post ID out of a Medium URL. Links to all Medium posts end with a unique value that represents its ID:

https://medium.com/p/my-slug-5940ded906e7 -> 5940ded906e7
https://medium.com/p/5940ded906e7         -> 5940ded906e7

func ParseMediumUsername

func ParseMediumUsername(s string) string

ParseMediumUsername Parses username out of a Medium URL. For now it only supports medium.com/@username and username.medium.com.

Caveat: sometimes username.medium.com is not username at all but we will ignore this fact for now.

func TestParser

func TestParser(d string, t *testing.T, fn func(io.Reader, io.Reader) bool)

func UnzipArchive

func UnzipArchive(src string, dest string) (err error)

UnzipArchive decompresses Medium Archive into directory dest

func ZipArchive

func ZipArchive(src string, dest string) (err error)

Types

type Node

type Node struct {
	*html.Node
	FirstChild  *Node
	NextSibling *Node
	Attrs       map[string]string
}

A version of html.Node with easier access to attributes via Attrs

func NewNode

func NewNode(n *html.Node) *Node

NewNode wraps html.Node into util.Node

func NewNodeFromHTML

func NewNodeFromHTML(dat io.Reader) (*Node, error)

NewNodeFromHTML returns the parse tree for the HTML from the given Reader.

Under the hood it uses html.Parse to parse HTML but wraps the result into the util.Node.

func (*Node) ExtractImage

func (n *Node) ExtractImage() (img *schema.Image)

Extract extracts image metadata from a given Node

func (*Node) FirstChildElement

func (n *Node) FirstChildElement(name string) *Node

FirstChildElement returns first child node with type html.ElementNode and a given tag name. Returns nil if no such element can be found.

func (*Node) HasClass

func (n *Node) HasClass(name string) bool

IsElement returns true if the Node contains a given class

func (*Node) IsElement

func (n *Node) IsElement(name string) bool

IsElement returns true if the Node is html.ElementNode with a given tag name

func (*Node) Markup

func (n *Node) Markup() (markup []schema.Markup)

Markup returns a stacked slice of schema.Markup for the giving Node relative (and applicable to) the output of Text()

func (*Node) NextSiblingElement

func (n *Node) NextSiblingElement(name string) *Node

NextSiblingElement returns next sibling node with type html.ElementNode and a given tag name. Returns nil if no such element can be found.

func (*Node) ParseGrafs

func (n *Node) ParseGrafs() []schema.Graf

ParseGrafs parses a give Node and extracts all grafs, together with their markups.

func (*Node) Text

func (n *Node) Text() (s string)

Text returns text content of a given Node, stripping away all HTML elements. It also collapses unnecessary space into a single space character and trims space at the beginning and end.

Examples:

	In:  <p>This is a message from the <strong>log</strong></p>
 Out: This is a message from the log

 In:  <p>
        This is a message from the
          <strong>
            log
          </strong>
      </p>
	Out: This is a message from the log

func (*Node) TextPreformatted

func (n *Node) TextPreformatted() string

TextPreformatted is like Text except it preserves all spaces

func (*Node) WalkChildren

func (n *Node) WalkChildren(cb func(*Node))

WalkChildren does a depth-first walk through all children of a given Node

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL