parse

package
v0.0.0-...-7c66d03 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 1, 2015 License: Apache-2.0 Imports: 6 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ExtractLinks(payload string, originalURL string, shouldFetch URLFetchChecker) (toFetch ExtractedLinks, toStore ExtractedLinks)

ExtractLinks gets links from a page

Types

type ExtractedLinks struct {
	OriginalURL string
	URL         []string
}

ExtractedLinks holds the current url we parsed and the links extracted from it

type PageStructure

type PageStructure struct {
	Title string   `json:"title,omitempty"`
	H1    []string `json:"h1,omitempty"`
	H2    []string `json:"h2,omitempty"`
	H3    []string `json:"h3,omitempty"`
	H4    []string `json:"h4,omitempty"`
	Text  []string `json:"text,omitempty"`
}

PageStructure holds the parsed/extracted data from a page

func ExtractText

func ExtractText(payload string) PageStructure

ExtractText extracts text from a page

type URLFetchChecker

type URLFetchChecker func(url string) bool

URLFetchChecker is a function that tells us if we should fetch a link or not

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL