gokogiri

package module
v0.0.0-...-7744dc4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 23, 2020 License: MIT Imports: 2 Imported by: 0

README

Gokogiri

LibXML bindings for the Go programming language.

By Zhigang Chen and Hampton Catlin

This is a major rewrite from v0 in the following places:

  • Separation of XML and HTML
  • Put more burden of memory allocation/deallocation on Go
  • Fragment parsing -- no more deep-copy
  • Serialization
  • Some API adjustment

Installation

# Linux
sudo apt-get install libxml2-dev
# Mac
brew install libxml2

go get github.com/moovweb/gokogiri

Running tests

go test github.com/moovweb/gokogiri/...

Basic example

package main

import (
  "net/http"
  "io/ioutil"
  "github.com/moovweb/gokogiri"
)

func main() {
  // fetch and read a web page
  resp, _ := http.Get("http://www.google.com")
  page, _ := ioutil.ReadAll(resp.Body)

  // parse the web page
  doc, _ := gokogiri.ParseHtml(page)

  // perform operations on the parsed page -- consult the tests for examples

  // important -- don't forget to free the resources when you're done!
  doc.Free()
}

Documentation

Overview

The gokogiri package provides a Go interface to the libxml2 library.

It is inspired by the ruby-based Nokogiri API, and allows one to parse, manipulate, and create HTML and XML documents. Nodes can be selected using either CSS selectors (in much the same fashion as jQuery) or XPath 1.0 expressions, and a simple DOM-like inteface allows for building up documents from scratch.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ParseHtml

func ParseHtml(content []byte) (doc *html.HtmlDocument, err error)

ParseHtml parses an UTF-8 encoded byte array and returns an html.HtmlDocument. It uses parsing default options that ignore errors or warnings, making it suitable for the poorly-formed 'tag soup' often found on the web.

If the content is not UTF-8 encoded or you want to customize the parsing options, you should call html.Parse directly.

func ParseXml

func ParseXml(content []byte) (doc *xml.XmlDocument, err error)

ParseXml parses an UTF-8 encoded byte array and returns an xml.XmlDocument. By default the parsing options ignore validation and suppress errors and warnings. This allows one to liberal in accepting badly-formed documents, but is not standards-compliant.

If the content is not UTF-8 encoded or you want to customize the parsing options, you should call the Parse or ReadFile functions found in the github.com/moovweb/gokogiri/xml package. The xml.StrictParsingOption is conveniently provided for standards-compliant behaviour.

Types

This section is empty.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL