microdata

package module
v0.0.0-...-2274d02 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 15, 2020 License: BSD-2-Clause Imports: 8 Imported by: 0

README

Microdata

Microdata is a package for the Go programming language to extract Microdata from HTML5 documents.

HTML Microdata is a markup specification often used in combination with the [schema collection][3] to make it easier for search engines to identify and understand content on web pages. One of the most common schema is the rating you see when you google for something. Other schemas are persons, places, events, products, etc.

Installation

Build from source:

$ go get -u github.com/damian-szulc/microdata/cmd/microdata

Usage

Parse from URL:

$ microdata https://www.gog.com/game/...
{
  "items": [
    {
      "type": [
        "http://schema.org/Product"
      ],
      "properties": {
        "additionalProperty": [
          {
            "type": [
              "http://schema.org/PropertyValue"
            ],
{
...

Parse HTML from the stdin:

$ cat saved.html | microdata

Format the output with a Go template to return the "price" property:

$ microdata -format '{{with index .Items 0}}{{with index .Properties "offers" 0}}{{with index .Properties "price" 0 }}{{ . }}{{end}}{{end}}{{end}}' https://www.gog.com/game/...
8.99

Features

  • Windows/BSD/Linux supported
  • Format output with Go templates
  • Parse from Stdin

Go Package

package main

import (
	"encoding/json"
	"os"

	"github.com/damian-szulc/microdata"
)

func main() {
	var data microdata.Microdata
	data, _ = microdata.ParseURL("http://example.com/blogposting")
	b, _ := json.MarshalIndent(data, "", "  ")
	os.Stdout.Write(b)
}

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Item

type Item struct {
	Types      []string    `json:"type"`
	Properties PropertyMap `json:"properties"`
	ID         string      `json:"id,omitempty"`
}

func NewItem

func NewItem() *Item

NewItem returns a new Item.

type Microdata

type Microdata struct {
	Items []*Item `json:"items"`
}

func ParseHTML

func ParseHTML(r io.Reader, contentType string, u *url.URL) (*Microdata, error)

ParseHTML parses the HTML document available in the given reader and returns the microdata. The given url is used to resolve the URLs in the attributes. The given contentType is used convert the content of r to UTF-8. When the given contentType is equal to "", the content type will be detected using `http.DetectContentType`.

func ParseHTMLTree

func ParseHTMLTree(tree *html.Node, u *url.URL) (*Microdata, error)

ParseHTMLTree parses the HTML document passed as an argument

func ParseURL

func ParseURL(urlStr string) (*Microdata, error)

ParseURL parses the HTML document available at the given URL and returns the microdata.

type PropertyMap

type PropertyMap map[string]ValueList

type ValueList

type ValueList []interface{}

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL