Documentation
¶
Overview ¶
Package util implements functions useful when parsing and traversing HTML generated by Medium's export tool.
Index ¶
- Variables
- func DownloadImage(img, dest string) error
- func FindArchiveRoot(dir string) (string, error)
- func GenerateReceiptNumber() string
- func GetQueuedImages() (q []string)
- func ParseMediumId(s string) string
- func ParseMediumUsername(s string) string
- func TestParser(d string, t *testing.T, fn func(io.Reader, io.Reader) bool)
- func UnzipArchive(src string, dest string) (err error)
- func ZipArchive(src string, dest string) (err error)
- type Node
- func (n *Node) ExtractImage() (img *schema.Image)
- func (n *Node) FirstChildElement(name string) *Node
- func (n *Node) HasClass(name string) bool
- func (n *Node) IsElement(name string) bool
- func (n *Node) Markup() (markup []schema.Markup)
- func (n *Node) NextSiblingElement(name string) *Node
- func (n *Node) ParseGrafs() []schema.Graf
- func (n *Node) Text() (s string)
- func (n *Node) TextPreformatted() string
- func (n *Node) WalkChildren(cb func(*Node))
Constants ¶
This section is empty.
Variables ¶
var DL_QUEUE map[string]bool
var (
ErrArchiveRootNotFound = errors.New("meh: archive root not found")
)
var SPACE_RE *regexp.Regexp = regexp.MustCompile(`\s+`)
Functions ¶
func DownloadImage ¶
DownloadImage downloads an image from Medium CDN and saves it into a directory specified by dest.
func FindArchiveRoot ¶
FindArchiveRoot attempts to find where the actual archive starts by looking for a README.html file. It only looks max one level deep.
func GenerateReceiptNumber ¶
func GenerateReceiptNumber() string
GenerateReceiptNumber returns a pseudo-random string of letters and digits.
func GetQueuedImages ¶
func GetQueuedImages() (q []string)
GetQueuedImages returns a slice of image IDs that need to be downloaded
func ParseMediumId ¶
ParseMediumId Parses post ID out of a Medium URL. Links to all Medium posts end with a unique value that represents its ID:
https://medium.com/p/my-slug-5940ded906e7 -> 5940ded906e7 https://medium.com/p/5940ded906e7 -> 5940ded906e7
func ParseMediumUsername ¶
ParseMediumUsername Parses username out of a Medium URL. For now it only supports medium.com/@username and username.medium.com.
Caveat: sometimes username.medium.com is not username at all but we will ignore this fact for now.
func UnzipArchive ¶
UnzipArchive decompresses Medium Archive into directory dest
func ZipArchive ¶
Types ¶
type Node ¶
A version of html.Node with easier access to attributes via Attrs
func NewNodeFromHTML ¶
NewNodeFromHTML returns the parse tree for the HTML from the given Reader.
Under the hood it uses html.Parse to parse HTML but wraps the result into the util.Node.
func (*Node) ExtractImage ¶
Extract extracts image metadata from a given Node
func (*Node) FirstChildElement ¶
FirstChildElement returns first child node with type html.ElementNode and a given tag name. Returns nil if no such element can be found.
func (*Node) IsElement ¶
IsElement returns true if the Node is html.ElementNode with a given tag name
func (*Node) Markup ¶
Markup returns a stacked slice of schema.Markup for the giving Node relative (and applicable to) the output of Text()
func (*Node) NextSiblingElement ¶
NextSiblingElement returns next sibling node with type html.ElementNode and a given tag name. Returns nil if no such element can be found.
func (*Node) ParseGrafs ¶
ParseGrafs parses a give Node and extracts all grafs, together with their markups.
func (*Node) Text ¶
Text returns text content of a given Node, stripping away all HTML elements. It also collapses unnecessary space into a single space character and trims space at the beginning and end.
Examples:
In: <p>This is a message from the <strong>log</strong></p> Out: This is a message from the log In: <p> This is a message from the <strong> log </strong> </p> Out: This is a message from the log
func (*Node) TextPreformatted ¶
TextPreformatted is like Text except it preserves all spaces
func (*Node) WalkChildren ¶
WalkChildren does a depth-first walk through all children of a given Node