sitescraper

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 1, 2021 License: BSD-3-Clause Imports: 3 Imported by: 0

README

sitescraper

Scraping Websites in Go!

Examples:

Get InnerHTML:
html := "<html><body><div id="hello">Hello <u>World!</u></div></body></html>"

dom := sitescraper.ParseHTML(html)

innerHTML := dom.Filter("body").GetInnerHTML()

fmt.Println(innerHTML)

//Output: <div id="hello">Hello <u>World!</u></div>

Get Text:
html := "<html><body><div id="hello">Hello World!</div></body></html>"

dom := sitescraper.ParseHTML(html)

text := dom.Filter("div", "id", "hello").GetText()

fmt.Println(text)

//Output: Hello World!

Get Text from single Tags:
html := "<html><body><div>Hello World!</div><div>My name is Sam!</div></body></html>"

dom := sitescraper.ParseHTML(html)

dom = dom.Filter("div")

fmt.Println(dom.Tag[0].GetText())  //Output: Hello World!

fmt.Println(dom.Tag[1].GetText())  //Output: My name is Sam!

Works also with GetInnerHTML()

Get Website-Content:
html, err := sitescraper.Get("http://example.com/")

if err != nil {
    log.Fatal(err)
}

dom := sitescraper.ParseHTML(html)

dom = dom.Filter("div")

fmt.Println(dom.GetInnerHTML())

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Get

func Get(url string) (string, error)

Sends a Get-Request to an URL and returns a string of the DOM that can be parsed with func ParseHTML

Types

type Dom

type Dom struct {
	Tag []Tag
	// contains filtered or unexported fields
}

func ParseHTML

func ParseHTML(html string) Dom

Parses HTML-String and returns an accessible Dom

func (Dom) Filter

func (d Dom) Filter(filter ...string) Dom

Filters the Dom by given parameters in the following order: Filter(tagname, attribute-name, attribute-value). For Example: d.Filter("div", "class", "main") You can also leave one argument out by typing d.Filter("", "class", "main") or d.Filter("*", "class", "main"). Or you can just filter the Dom by tagname using d.Filter("div") or by tagname and attribute using d.Filter("div", "class"). A filtered Dom can be filtered again with Filter() e.g. d.Filter("", "class", "main").Filter("span")

func (Dom) GetAttrValue

func (d Dom) GetAttrValue(attrname string) string

Returns the Attribute-Value of all Tags of the Dom filtered by the given Attribute-Name as string

func (Dom) GetInnerHTML

func (d Dom) GetInnerHTML() string

Returns whole innerHTML of all Tags of the Dom or filtered Dom as string

func (Dom) GetText

func (d Dom) GetText() string

Returns the whole Text of all Tags of the Dom or filtered Dom as string

type Tag

type Tag struct {
	// contains filtered or unexported fields
}

func (Tag) GetAttrValue

func (t Tag) GetAttrValue(attr string) string

Returns the Value of the given Attribute as string

func (Tag) GetInnerHTML

func (t Tag) GetInnerHTML() string

Returns InnerHTML inside a Tag as string

func (Tag) GetTagName

func (t Tag) GetTagName() string

Returns the name of the Tag as string

func (Tag) GetText

func (t Tag) GetText() string

Returns pure Text inside a Tag

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL