rabbit

package module

v0.0.0-...-490b20b Latest Latest Go to latest Published: Apr 9, 2021 License: MIT Imports: 12 Imported by: 1

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/zzossig/rabbit

Links

Open Source Insights

README ¶

🐰rabbit

An interpreted language written in Go - XPath 3.1 implementation for HTML

XML Path Language(XPath) 3.1 is W3C recommendation since 21 march 2017. The rabbit language is built for selecting HTML nodes with XPath syntax.

Overview

Rabbit language is built for HTML, not for XML. Since XPath 3.1 is targeted for XML, it was not possible to implement all the concepts listed in https://www.w3.org/TR/xpath-31/. But in most cases, it is fair enough for selecting HTML nodes with rabbit language.

For example)

//a
//div[@category='web']/preceding::node()[2]
let $abc := ('a', 'b', 'c') return fn:insert-before($abc, 4, 'z')

Basic Usage

// you can chaining xpath object. data is nil or []string
data := rabbit.New().SetDoc("uri/or/filepath.txt").Eval("//a").GetAll()

// if you expect evaled result is a sequence of html node, 
// use NodeAll() instead of DataAll() or GetAll()
nodes := rabbit.New().SetDoc("uri/or/filepath.txt").Eval("//a").NodeAll()

// with error check
x := rabbit.New()
x.SetDoc("uri/or/filepath.txt")
if len(x.Errors()) > 0 {
  // ... do something with errors (the x.Errors() type is []error)
}
x.Eval("//a")
if len(x.Errors()) > 0 {
  // ... do something with errors
}
data = x.DataAll()

// without SetDoc. Since document is not set in the context, 
// node related xpath expressions are not going to work.
x := rabbit.New()
data := x.Eval("1+1").Data()

// you can test simple xpath expressions using cli program
rabbit.New().SetDoc("uri/or/filepath.txt").CLI()

Features

What is supported

Primary Expressions
- Integer(1)
- Decimal(1.1)
- Double(1e1)
- String("")
- Boolean(true, false)
- Variable($var)
- Context Item(.)
- Placeholder(?)
Functions
- Named Function(built in function - bif)
- Inline Function(custom function)
- Map
- Array
- Arrow operator(=>)
- Simple Map Operator(!)
Path Expressions
- Forward Step(child::, descendant::, ...)
- Reverse Step(parent::, ...)
- Node Test
- Predicate([])
- Abbreviated Syntax(@, ..)
Sequence Expressions(())
Arithmetic Expressions
- Additive(+, -)
- Multiplicative(*, div, idiv, mod)
- Unary(+, -)
String Concatenation Expressions(||)
Comparison Expressions
- Value Compare(eq, ne, lt, le, gt, ge)
- Node Compare(is, <<, >>)
- General Compare(=, !=, <, <=, >, >=)
Logical Expressions(and, or)
For Expressions(for)
Let Expressions(let)
Conditional Expressions(if)
Quantified Expressions(some, every)
Lookup(?)

What is not supported

Namespace
Rabbit language doesn't care about prefixed tag names or xmlns attributes in tags. So, xmlns attribute is not treated as a namespace node, and a prefixed tag does not complain if no namespace for the prefix is specified in a document.
Limited Types
There is a bunch of data types in XPath data model. You can check all the types in https://www.w3.org/TR/xpath-datamodel-31/. Many of the types are not supported in Rabbit language and most of the data types in Rabbit language are simplified as string. It makes no sense to implement all the data types because there are no such things as XML Schema Definition(xsd) in HTML.
Limited KindTest
In the XPath 3.1 document, there are 10 kinds of KindTest. But namespace-node test, processing-instruction test, schema-attribute test, schema-element test is not supported in Rabbit language because our parsing engine(/x/net/html) does not recognize them.
Sequence Type Check
In XPath 3.1, you can specify data types in lnline function. It looks like this. function($a as xs:string) as xs:string {$a}. This syntax is not a part of the Rabbit language. The inline function should like this. function($a) {$a}.
Node Test with Argument
Node test with argument is not supported. For example, element(person), element(person, surgeon), element(*, surgeon), attribute(price), attribute(*, xs:decimal) are not allowed. But you can do element(), attribute().
Wildcard Expressions
Only * wildcard is allowed in the Rabbit language. NCName:*, *:NCName, BracedURILiteral* are not supported since namespace is not a big deal in the Rabbit language.

Notice

Attribute node is custom *html.Node type

Rabbit language support attribute node. But /x/net/html package has no such a type(it only has 6 kinds of nodes) and treats attribute as a field of an element node. So, in order to make an attribute as a node, I had to make a custom *html.Node type. It has the following fields.

Type: html.NodeType(7).
Parent: node(*html.Node) that is contain the attribute
FirstChild, LastChild: nil
PrevSibling, NextSibling: prev or next attribute node(*html.Node) of current one
Data: attribute key(string).
DataAtom: atomized Data(atom.Atom)
Namespace: ""(empty string)
Attr: Attr field contains only one html.Attribute item. Is has key, value pair for the attribute.

Not well-formed document will be transformed

Rabbit language uses the /x/net/html package for parsing HTML. So, the type of the selected node will be *html.Node. One thing that should know is that /x/net/html package wraps a document with html, head, body tags if it is not well-formed.

For example, if your document looks like this

<div>
  ...
</div>

/x/net/html package transforms the document to this internally.

<html>
  <head></head>
  <body>
    <div>
      ...
    </div>
  </body>
</html>

So, in this example, XPath expression /div has no result because the root node is an html, not div. Keep in mind this fact and otherwise, you can get confused.

Documentation ¶

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type XPath ¶

type XPath struct {
	// contains filtered or unexported fields
}

XPath is a base object to evaluate xpath expressions. xpath field is xpath expression that is saved when using Eval method. context is a context that contains a document and context node. SetDoc function saves a document to the context field. evaled field is set when calling Eval method. object.Item is a custom data type used in rabbit language. You can convert object.Item to a golang data type using Data or Nodes method. errors field is collected errors while parsing and evaluating

func New ¶

func New() *XPath

New creates new xpath object.

func (*XPath) CLI ¶

func (x *XPath) CLI()

CLI is a command line interface

func (*XPath) Data ¶

func (x *XPath) Data() interface{}

Data selects first item of returned value from DataAll

func (*XPath) DataAll ¶

func (x *XPath) DataAll() []interface{}

DataAll convert evaled field to []interface{}

func (*XPath) Errors ¶

func (x *XPath) Errors() []error

Errors returns errors field

func (*XPath) Eval ¶

func (x *XPath) Eval(input string) *XPath

Eval evaluates a xpath expression and save the result to evaled field.

func (*XPath) Evals ¶

func (x *XPath) Evals(input string) []*XPath

Evals evaluates a xpath expression and returns slice of *XPath.

func (*XPath) Get ¶

func (x *XPath) Get() string

func (*XPath) GetAll ¶

func (x *XPath) GetAll() []string

func (*XPath) Node ¶

func (x *XPath) Node() *html.Node

Node selects first item of returned value from NodeAll

func (*XPath) NodeAll ¶

func (x *XPath) NodeAll() []*html.Node

NodeAll convert evaled field to []*html.Node

func (*XPath) Raw ¶

func (x *XPath) Raw() object.Item

Raw returns evaled field

func (*XPath) SetDoc ¶

func (x *XPath) SetDoc(input string) *XPath

SetDoc set document to a context. if document is not set in a context, node related xpath expressions are not going to work. input param can be url or local filepath.

func (*XPath) SetDocN ¶

func (x *XPath) SetDocN(n *html.Node) *XPath

SetDocN is another version of SetDoc.

func (*XPath) SetDocR ¶

func (x *XPath) SetDocR(r *http.Response) *XPath

SetDocR is another version of SetDoc.

func (*XPath) SetDocS ¶

func (x *XPath) SetDocS(s string) *XPath

SetDocS is another version of SetDoc.

func (*XPath) String ¶

func (x *XPath) String() string

String returns input field

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
ast
bif
eval
lexer
object
parser
repl
token
util

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL