gopherio

package module
v0.0.0-...-a0a172d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 19, 2025 License: MIT Imports: 7 Imported by: 0

README

gopherio

gopherio is a fast and lightweight html parser for go. it lets you scrape, query, and manipulate html with an easy api

Documentation

Overview

Package gopherio provides a simple and lightweight way to parse and manipulate HTML in Go. Inspired by cheerio in JavaScript, it gives you a clean API for selecting, traversing, and extracting content from HTML documents.

gopherio is built for developers who need to:

  • scrape or query HTML from web pages
  • extract text, attributes, or structured data
  • modify or inspect DOM trees in memory

it focuses on being fast, minimal, and intuitive, making it a good fit for projects where pulling in a full browser engine would be overkill.

Example usage:

doc, _ := gopherio.Load(`<html><body><h1>hello</h1></body></html>`)
title := doc.Find("h1").Text()
fmt.Println(title) // output: hello

gopherio helps you work with HTML in Go the same way cheerio helps in JS.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Document

type Document struct {
	// contains filtered or unexported fields
}

func Load

func Load(src any, headers ...map[string]string) (*Document, error)

Load parses html from different sources: string, []byte, url, or file. The first parameter can be of type string or []byte. If it's a string, gopherio will detect whether it's a raw html snippet, a file path, or a url. For url, you can pass optional headers as a map[string]string.

Examples:

// from raw html string
doc, _ := gopherio.Load("<html><body><h1>hello</h1></body></html>")

// from []byte
data := []byte("<p>world</p>")
doc, _ := gopherio.Load(data)

// from file
doc, _ := gopherio.Load("index.html")

// from url
doc, _ := gopherio.Load("https://example.com")

// from url with headers
headers := map[string]string{"User-Agent": "gopherio"}
doc, _ := gopherio.Load("https://example.com", headers)
Example (Basic)
package main

import (
	"fmt"

	"github.com/AstroX11/gopherio"
)

func main() {
	doc, _ := gopherio.Load(`<html><body><h1>hello</h1></body></html>`)
	title := doc.Find("h1").Text()
	fmt.Println(title)
}
Output:

hello
Example (Combinators)
package main

import (
	"fmt"

	"github.com/AstroX11/gopherio"
)

func main() {
	html := `
	<div class="wrap">
		<h1>Header</h1>
		<p class="first">para1</p>
		<p>para2</p>
		<span>end</span>
	</div>`
	doc, _ := gopherio.Load(html)

	fmt.Println(doc.Find("div.wrap > h1").Text())
	fmt.Println(doc.Find("h1 + p").Text())
	fmt.Println(doc.Find("h1 ~ span").Text())
}
Output:

Header
para1
end
Example (Groups)
package main

import (
	"fmt"

	"github.com/AstroX11/gopherio"
)

func main() {
	html := `<div><h1>head</h1><p class="a">p1</p><p>p2</p><span>done</span></div>`
	doc, _ := gopherio.Load(html)

	nodes := doc.Find("h1, p.a, span")
	for _, n := range nodes.Nodes() {
		fmt.Println(n.Data)
	}

}
Output:

h1
p
span
Example (Pseudo)
package main

import (
	"fmt"

	"github.com/AstroX11/gopherio"
)

func main() {
	html := `
	<ul>
		<li>one</li>
		<li>two</li>
		<li>three</li>
		<li>four</li>
	</ul>`
	doc, _ := gopherio.Load(html)

	fmt.Println(doc.Find("li:first-child").Text())
	fmt.Println(doc.Find("li:last-child").Text())
	fmt.Println(doc.Find("li:nth-child(2)").Text())
	fmt.Println(doc.Find("li:nth-of-type(3)").Text())
}
Output:

one
four
two
three
Example (Selectors)
package main

import (
	"fmt"

	"github.com/AstroX11/gopherio"
)

func main() {
	html := `
	<div id="main" class="container">
		<h1>Title</h1>
		<p class="intro">Hello</p>
		<a href="link1">Link 1</a>
		<span class="note">Note</span>
	</div>`
	doc, _ := gopherio.Load(html)

	fmt.Println(doc.Find("div#main.container a[href]").Text())
	fmt.Println(doc.Find(".intro").Text())
	fmt.Println(doc.Find("#main .note").Text())
}
Output:

Link 1
Hello
Note

func (*Document) Doc

func (d *Document) Doc() *html.Node

Doc returns the underlying *html.Node of the document. This allows direct access to the raw parsed DOM tree, useful when you need to inspect or manipulate the document beyond what Document helpers provide.

func (*Document) Find

func (d *Document) Find(selector string) *Selection

Find searches the document tree using CSS-like selectors. Supports:

  • tag (div, h1)
  • id (#main)
  • class (.btn)
  • attributes ([href="x"])
  • descendant/child chains (div.container a[href])
  • compound selectors (div#main.container)

Example:

doc, _ := gopherio.Load(`<div id="main" class="container"><a href="x">link</a></div>`)
doc.Find("div.container a[href]").Text() // link

func (*Document) Root

func (d *Document) Root() *html.Node

Root returns the root html node of the document.

type Selection

type Selection struct {
	// contains filtered or unexported fields
}

func NewSelection

func NewSelection(nodes []*html.Node) *Selection

NewSelection creates a Selection from a slice of html.Node pointers.

func (*Selection) After

func (s *Selection) After(content string)

After inserts the given HTML or node(s) immediately after each element in the selection.

Example:

doc, _ := gopherio.Load(`<div><p>hi</p></div>`)
doc.Find("p").After("<span>after</span>")
fmt.Println(doc.Find("div").Html()) // <p>hi</p><span>after</span>

func (*Selection) Append

func (s *Selection) Append(content string)

Append inserts the given HTML or node(s) as the last child of each element in the selection.

Example:

doc, _ := gopherio.Load(`<div><p>hi</p></div>`)
doc.Find("div").Append("<span>world</span>")
fmt.Println(doc.Find("div").Html()) // <p>hi</p><span>world</span>

func (*Selection) Attr

func (s *Selection) Attr(key string) string

Attr returns the value of the given attribute from the first node in the selection. If the attribute does not exist or the selection is empty, it returns an empty string.

Example:

doc, _ := gopherio.Load(`<a href="/x">link</a>`)
fmt.Println(doc.Find("a").Attr("href")) // /x

func (*Selection) Attrs

func (s *Selection) Attrs() map[string]string

Attrs returns all attributes of the first node in the selection as a map. If the selection is empty, it returns an empty map.

Example:

doc, _ := gopherio.Load(`<a href="/x" id="link1">link</a>`)
fmt.Println(doc.Find("a").Attrs()) // map[href:/x id:link1]

func (*Selection) Before

func (s *Selection) Before(content string)

Before inserts the given HTML or node(s) immediately before each element in the selection.

Example:

doc, _ := gopherio.Load(`<div><p>hi</p></div>`)
doc.Find("p").Before("<span>before</span>")
fmt.Println(doc.Find("div").Html()) // <span>before</span><p>hi</p>

func (*Selection) Children

func (s *Selection) Children() *Selection

Children returns all direct child elements of the nodes in the selection.

Example:

doc, _ := gopherio.Load(`<div><p>one</p><p>two</p></div>`)
fmt.Println(doc.Find("div").Children().Length()) // 2

func (*Selection) Clone

func (s *Selection) Clone() *Selection

Clone creates a deep copy of all nodes in the selection and returns them as a new selection.

Example:

doc, _ := gopherio.Load(`<div><p>hello</p></div>`)
clone := doc.Find("p").Clone()
fmt.Println(clone.Text()) // hello

func (*Selection) Contains

func (s *Selection) Contains(text string) *Selection

Contains reduces the selection to elements that contain the given text.

Example:

doc, _ := gopherio.Load(`<ul><li>foo</li><li>bar</li></ul>`)
fmt.Println(doc.Find("li").Contains("bar").Text()) // bar

func (*Selection) Each

func (s *Selection) Each(f func(int, *Selection)) *Selection

Each iterates over the nodes in the selection, executing the callback with the index and node wrapped in a new selection. It returns the original selection for chaining.

Example:

doc, _ := gopherio.Load(`<ul><li>one</li><li>two</li></ul>`)
doc.Find("li").Each(func(i int, sel *Selection) {
	fmt.Println(i, sel.Text())
})
// 0 one
// 1 two

func (*Selection) Empty

func (s *Selection) Empty() bool

Empty returns true if the selection has no nodes.

Example:

doc, _ := gopherio.Load(`<div></div>`)
fmt.Println(doc.Find("p").Empty()) // true

func (*Selection) Eq

func (s *Selection) Eq(index int) *Selection

Eq returns the element at the specified index as a new selection. If the index is out of range, it returns an empty selection.

Example:

doc, _ := gopherio.Load(`<ul><li>one</li><li>two</li></ul>`)
fmt.Println(doc.Find("li").Eq(1).Text()) // two

func (*Selection) Filter

func (s *Selection) Filter(selector string) *Selection

Filter reduces the selection to elements that match the given selector.

Example:

doc, _ := gopherio.Load(`<ul><li class="x">one</li><li>two</li></ul>`)
fmt.Println(doc.Find("li").Filter(".x").Text()) // one

func (*Selection) Find

func (s *Selection) Find(selector string) *Selection

Find searches descendants of the selection using a selector (same as Document.Find).

Example:

doc, _ := gopherio.Load(`<div><p>one</p><p>two</p></div>`)
doc.Find("div").Find("p").Each(func(i int, sel *gopherio.Selection) {
    fmt.Println(sel.Text())
})

func (*Selection) First

func (s *Selection) First() *Selection

First returns the first node in the selection as a new selection. If the selection is empty, it returns an empty selection.

Example:

doc, _ := gopherio.Load(`<ul><li>one</li><li>two</li></ul>`)
fmt.Println(doc.Find("li").First().Text()) // one

func (*Selection) Has

func (s *Selection) Has(selector string) *Selection

Has reduces the selection to elements that have at least one descendant matching the given selector.

Example:

doc, _ := gopherio.Load(`<div><p>inside</p></div><div>empty</div>`)
fmt.Println(doc.Find("div").Has("p").Length()) // 1

func (*Selection) Html

func (s *Selection) Html() string

Html returns the inner HTML of all nodes in the selection concatenated.

Example:

doc, _ := gopherio.Load(`<div><b>hi</b></div>`)
fmt.Println(doc.Find("div").Html()) // <b>hi</b>

func (*Selection) Last

func (s *Selection) Last() *Selection

Last returns the last node in the selection as a new selection. If the selection is empty, it returns an empty selection.

Example:

doc, _ := gopherio.Load(`<ul><li>one</li><li>two</li></ul>`)
fmt.Println(doc.Find("li").Last().Text()) // two

func (*Selection) Length

func (s *Selection) Length() int

Length returns the number of nodes in the selection.

Example:

doc, _ := gopherio.Load(`<ul><li></li><li></li></ul>`)
fmt.Println(doc.Find("li").Length()) // 2

func (*Selection) Map

func (s *Selection) Map(f func(int, *Selection) string) []string

Map applies the callback to each node in the selection and returns a slice of results.

Example:

doc, _ := gopherio.Load(`<ul><li>one</li><li>two</li></ul>`)
texts := doc.Find("li").Map(func(i int, sel *Selection) string {
	return sel.Text()
})
fmt.Println(texts) // [one two]

func (*Selection) Next

func (s *Selection) Next() *Selection

Next returns the immediately following sibling elements of the nodes in the selection.

Example:

doc, _ := gopherio.Load(`<ul><li>one</li><li>two</li></ul>`)
fmt.Println(doc.Find("li").First().Next().Text()) // two

func (*Selection) Nodes

func (s *Selection) Nodes() []*html.Node

Nodes returns the underlying slice of *html.Node in the selection. this allows direct access to the raw parsed DOM nodes, useful when you need to inspect or manipulate nodes beyond what Selection helpers provide.

func (*Selection) Not

func (s *Selection) Not(selector string) *Selection

Not removes elements that match the given selector from the selection.

Example:

doc, _ := gopherio.Load(`<ul><li class="x">one</li><li>two</li></ul>`)
fmt.Println(doc.Find("li").Not(".x").Text()) // two

func (*Selection) Parent

func (s *Selection) Parent() *Selection

Parent returns the parent elements of all nodes in the selection (unique).

Example:

doc, _ := gopherio.Load(`<div><p>hi</p></div>`)
fmt.Println(doc.Find("p").Parent().Nodes()[0].Data) // div

func (*Selection) Prepend

func (s *Selection) Prepend(content string)

Prepend inserts the given HTML or node(s) as the first child of each element in the selection.

Example:

doc, _ := gopherio.Load(`<div><p>hi</p></div>`)
doc.Find("div").Prepend("<span>start</span>")
fmt.Println(doc.Find("div").Html()) // <span>start</span><p>hi</p>

func (*Selection) Prev

func (s *Selection) Prev() *Selection

Prev returns the immediately preceding sibling elements of the nodes in the selection.

Example:

doc, _ := gopherio.Load(`<ul><li>one</li><li>two</li></ul>`)
fmt.Println(doc.Find("li").Last().Prev().Text()) // one

func (*Selection) Remove

func (s *Selection) Remove()

Remove deletes the nodes in the selection from their parent.

Example:

doc, _ := gopherio.Load(`<div><p>hello</p><p>bye</p></div>`)
doc.Find("p").First().Remove()
fmt.Println(doc.Find("p").Text()) // bye

func (*Selection) ReplaceWith

func (s *Selection) ReplaceWith(content string)

ReplaceWith replaces each element in the selection with the given HTML or node(s).

Example:

doc, _ := gopherio.Load(`<div><p>hi</p></div>`)
doc.Find("p").ReplaceWith("<span>hello</span>")
fmt.Println(doc.Find("div").Html()) // <span>hello</span>

func (*Selection) Siblings

func (s *Selection) Siblings() *Selection

Siblings returns all sibling elements of the nodes in the selection (excluding themselves).

Example:

doc, _ := gopherio.Load(`<ul><li>one</li><li>two</li><li>three</li></ul>`)
fmt.Println(doc.Find("li").Eq(1).Siblings().Length()) // 2

func (*Selection) Text

func (s *Selection) Text() string

Text returns concatenated text of all nodes in the selection.

func (*Selection) Unwrap

func (s *Selection) Unwrap()

Unwrap removes the parent of each element in the selection.

Example:

doc, _ := gopherio.Load(`<div><section><p>hi</p></section></div>`)
doc.Find("p").Unwrap()
fmt.Println(doc.Find("div").Html()) // <p>hi</p>

func (*Selection) Wrap

func (s *Selection) Wrap(content string)

Wrap wraps each element in the selection inside the given HTML structure.

Example:

doc, _ := gopherio.Load(`<div><p>hi</p></div>`)
doc.Find("p").Wrap("<section class='wrap'></section>")
fmt.Println(doc.Find("div").Html()) // <section class="wrap"><p>hi</p></section>

func (*Selection) WrapAll

func (s *Selection) WrapAll(content string)

WrapAll wraps the entire selection with a single wrapper element.

Example:

doc, _ := gopherio.Load(`<div><p>a</p><p>b</p></div>`)
doc.Find("p").WrapAll("<section></section>")
fmt.Println(doc.Find("div").Html()) // <section><p>a</p><p>b</p></section>

func (*Selection) WrapInner

func (s *Selection) WrapInner(content string)

WrapInner wraps the contents of each element in the selection.

Example:

doc, _ := gopherio.Load(`<div><p>hi</p></div>`)
doc.Find("div").WrapInner("<section></section>")
fmt.Println(doc.Find("div").Html()) // <section><p>hi</p></section>

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL