zek

package module
v0.1.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 23, 2018 License: GPL-3.0 Imports: 11 Imported by: 0

README

zek

Zek is a prototype for creating a Go struct from an XML document.

Skip the fluff, just the code.

Given some XML, run:

$ curl -s https://raw.githubusercontent.com/miku/zek/master/fixtures/e.xml | zek -e -c
// Rss was generated 2018-08-30 20:24:14 by tir on sol.
type Rss struct {
    XMLName xml.Name `xml:"rss"`
    Text    string   `xml:",chardata"`
    Rdf     string   `xml:"rdf,attr"`
    Dc      string   `xml:"dc,attr"`
    Geoscan string   `xml:"geoscan,attr"`
    Media   string   `xml:"media,attr"`
    Gml     string   `xml:"gml,attr"`
    Taxo    string   `xml:"taxo,attr"`
    Georss  string   `xml:"georss,attr"`
    Content string   `xml:"content,attr"`
    Geo     string   `xml:"geo,attr"`
    Version string   `xml:"version,attr"`
    Channel struct {
        Text          string `xml:",chardata"`
        Title         string `xml:"title"`         // ESS New Releases (Display...
        Link          string `xml:"link"`          // http://tinyurl.com/ESSNew...
        Description   string `xml:"description"`   // New releases from the Ear...
        LastBuildDate string `xml:"lastBuildDate"` // Mon, 27 Nov 2017 00:06:35...
        Item          []struct {
            Text        string `xml:",chardata"`
            Title       string `xml:"title"`       // Surficial geology, Aberde...
            Link        string `xml:"link"`        // https://geoscan.nrcan.gc....
            Description string `xml:"description"` // Geological Survey of Cana...
            Guid        struct {
                Text        string `xml:",chardata"` // 304279, 306212, 306175, 3...
                IsPermaLink string `xml:"isPermaLink,attr"`
            } `xml:"guid"`
            PubDate       string   `xml:"pubDate"`      // Fri, 24 Nov 2017 00:00:00...
            Polygon       []string `xml:"polygon"`      // 64.0000 -98.0000 64.0000 ...
            Download      string   `xml:"download"`     // https://geoscan.nrcan.gc....
            License       string   `xml:"license"`      // http://data.gc.ca/eng/ope...
            Author        string   `xml:"author"`       // Geological Survey of Cana...
            Source        string   `xml:"source"`       // Geological Survey of Cana...
            SndSeries     string   `xml:"SndSeries"`    // Bedford Institute of Ocea...
            Publisher     string   `xml:"publisher"`    // Natural Resources Canada,...
            Edition       string   `xml:"edition"`      // prelim., surficial data m...
            Meeting       string   `xml:"meeting"`      // Geological Association of...
            Documenttype  string   `xml:"documenttype"` // serial, open file, serial...
            Language      string   `xml:"language"`     // English, English, English...
            Maps          string   `xml:"maps"`         // 1 map, 5 maps, Publicatio...
            Mapinfo       string   `xml:"mapinfo"`      // surficial geology, surfic...
            Medium        string   `xml:"medium"`       // on-line; digital, digital...
            Province      string   `xml:"province"`     // Nunavut, Northwest Territ...
            Nts           string   `xml:"nts"`          // 066B, 095J; 095N; 095O; 0...
            Area          string   `xml:"area"`         // Aberdeen Lake, Mackenzie ...
            Subjects      string   `xml:"subjects"`
            Program       string   `xml:"program"`       // GEM2: Geo-mapping for Ene...
            Project       string   `xml:"project"`       // Rae Province Project Mana...
            Projectnumber string   `xml:"projectnumber"` // 340521, 343202, 340557, 3...
            Abstract      string   `xml:"abstract"`      // This new surficial geolog...
            Links         string   `xml:"links"`         // Online - En ligne (PDF, 9...
            Readme        string   `xml:"readme"`        // readme | https://geoscan....
            PPIid         string   `xml:"PPIid"`         // 34532, 35096, 35438, 2563...
        } `xml:"item"`
    } `xml:"channel"`
}

Online

Try it online at https://www.onlinetool.io/xmltogo/.

About

Build Status

Upsides:

  • it works fine for non-recursive structures,
  • does not need XSD or DTD,
  • it is relatively convenient to access attributes, children and text,
  • will generate a single struct, which make for a quite compact representation,
  • simple user interface,
  • comments with examples,
  • schema inference across multiple files.

Downsides:

  • experimental, early, buggy, unstable prototype,
  • no support for recursive types (similar to Russian Doll strategy, [1])
  • no type inference, everything is accessible as string.

Bugs:

Mapping between XML elements and data structures is inherently flawed: an XML element is an order-dependent collection of anonymous values, while a data structure is an order-independent collection of named values.

https://golang.org/pkg/encoding/xml/#pkg-note-BUG

Related projects:

Install

$ go get github.com/miku/zek/cmd/...

Debian and RPM packages:

Usage

$ zek -h
Usage of zek:
  -F    skip formatting
  -c    emit more compact struct
  -d    debug output
  -e    add comments with example
  -j    add JSON tags
  -max-examples int
        limit number of examples (default 10)
  -n string
        use a different name for the top-level struct
  -p    write out an example program
  -s    strict parsing and writing
  -t string
        emit struct for tag matching this name
  -u    filter out duplicated examples
  -version
        show version
  -x int
        max chars for example (default 25)

Examples:

$ cat fixtures/a.xml
<a></a>

$ zek < fixtures/a.xml
type A struct {
    XMLName xml.Name `xml:"a"`
    Text    string   `xml:",chardata"`
}

Debug output dumps the internal tree as JSON to stdout.

$ zek -d < fixtures/a.xml
{"name":{"Space":"","Local":"a"}}

Example program:

package main

import (
	"encoding/json"
	"encoding/xml"
	"fmt"
	"log"
	"os"
)

// A was generated 2017-12-05 17:35:21 by tir on apollo.
type A struct {
	XMLName xml.Name `xml:"a"`
	Text    string   `xml:",chardata"`
}

func main() {
	dec := xml.NewDecoder(os.Stdin)
	var doc A
	if err := dec.Decode(&doc); err != nil {
		log.Fatal(err)
	}
	b, err := json.Marshal(doc)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(string(b))
}

$ zek -p < fixtures/a.xml > sample.go && go run sample.go < fixtures/a.xml | jq . && rm sample.go
{
  "XMLName": {
    "Space": "",
    "Local": "a"
  },
  "Text": ""
}

More complex example:

$ zek < fixtures/d.xml
type Root struct {
	XMLName xml.Name `xml:"root"`
	Text    string   `xml:",chardata"`
	A       []struct {
		Text string `xml:",chardata"`
		B    []struct {
			Text string `xml:",chardata"`
			C    struct {
				Text string `xml:",chardata"`
			} `xml:"c"`
			D struct {
				Text string `xml:",chardata"`
			} `xml:"d"`
		} `xml:"b"`
	} `xml:"a"`
}

$ zek -p < fixtures/d.xml > sample.go && go run sample.go < fixtures/d.xml | jq . && rm sample.go
{
  "XMLName": {
    "Space": "",
    "Local": "root"
  },
  "Text": "\n\n\n\n",
  "A": [
    {
      "Text": "\n  \n  \n",
      "B": [
        {
          "Text": "\n    \n  ",
          "C": {
            "Text": "Hi"
          },
          "D": {
            "Text": ""
          }
        },
        {
          "Text": "\n    \n    \n  ",
          "C": {
            "Text": "World"
          },
          "D": {
            "Text": ""
          }
        }
      ]
    },
    {
      "Text": "\n  \n",
      "B": [
        {
          "Text": "\n    \n  ",
          "C": {
            "Text": "Hello"
          },
          "D": {
            "Text": ""
          }
        }
      ]
    },
    {
      "Text": "\n  \n",
      "B": [
        {
          "Text": "\n    \n  ",
          "C": {
            "Text": ""
          },
          "D": {
            "Text": "World"
          }
        }
      ]
    }
  ]
}

Annotate with comments:

$ zek -e < fixtures/l.xml
type Records struct {
	XMLName xml.Name `xml:"Records"`
	Text    string   `xml:",chardata"` // \n
	Xsi     string   `xml:"xsi,attr"`
	Record  []struct {
		Text   string `xml:",chardata"`
		Header struct {
			Text       string `xml:",chardata"`
			Status     string `xml:"status,attr"`
			Identifier struct {
				Text string `xml:",chardata"` // oai:ojs.localhost:article...
			} `xml:"identifier"`
			Datestamp struct {
				Text string `xml:",chardata"` // 2009-06-24T14:48:23Z, 200...
			} `xml:"datestamp"`
			SetSpec struct {
				Text string `xml:",chardata"` // eppp:ART, eppp:ART, eppp:...
			} `xml:"setSpec"`
		} `xml:"header"`
		Metadata struct {
			Text    string `xml:",chardata"`
			Rfc1807 struct {
				Text           string `xml:",chardata"`
				Xmlns          string `xml:"xmlns,attr"`
				Xsi            string `xml:"xsi,attr"`
				SchemaLocation string `xml:"schemaLocation,attr"`
				BibVersion     struct {
					Text string `xml:",chardata"` // v2, v2, v2, v2, v2, v2, v...
				} `xml:"bib-version"`
				ID struct {
					Text string `xml:",chardata"` // http://journals.zpid.de/i...
				} `xml:"id"`
				Entry struct {
					Text string `xml:",chardata"` // 2009-06-24T14:48:23Z, 200...
				} `xml:"entry"`
				Organization []struct {
					Text string `xml:",chardata"` // Proceedings of the Worksh...
				} `xml:"organization"`
				Title struct {
					Text string `xml:",chardata"` // Introduction and some Ide...
				} `xml:"title"`
				Type struct {
					Text string `xml:",chardata"`
				} `xml:"type"`
				Author []struct {
					Text string `xml:",chardata"` // KRAMPEN, Günter, CARBON,...
				} `xml:"author"`
				Copyright struct {
					Text string `xml:",chardata"` // Das Urheberrecht liegt be...
				} `xml:"copyright"`
				OtherAccess struct {
					Text string `xml:",chardata"` // url:http://journals.zpid....
				} `xml:"other_access"`
				Keyword struct {
					Text string `xml:",chardata"`
				} `xml:"keyword"`
				Period []struct {
					Text string `xml:",chardata"`
				} `xml:"period"`
				Monitoring struct {
					Text string `xml:",chardata"`
				} `xml:"monitoring"`
				Language struct {
					Text string `xml:",chardata"` // en, en, en, en, en, en, e...
				} `xml:"language"`
				Abstract struct {
					Text string `xml:",chardata"` // After a short description...
				} `xml:"abstract"`
				Date struct {
					Text string `xml:",chardata"` // 2009-06-22 12:12:00, 2009...
				} `xml:"date"`
			} `xml:"rfc1807"`
		} `xml:"metadata"`
		About struct {
			Text string `xml:",chardata"`
		} `xml:"about"`
	} `xml:"Record"`
}

The above struct can be made a bit more compact - use the -c flag (since 0.1.4) to see how:

$ zek -c -e < fixtures/l.xml
// Records was generated 2018-08-09 14:10:25 by tir on sol.
type Records struct {
    XMLName xml.Name `xml:"Records"`
    Text    string   `xml:",chardata"` // \n
    Xsi     string   `xml:"xsi,attr"`
    Record  []struct {
        Text   string `xml:",chardata"`
        Header struct {
            Text       string `xml:",chardata"`
            Status     string `xml:"status,attr"`
            Identifier string `xml:"identifier"` // oai:ojs.localhost:article...
            Datestamp  string `xml:"datestamp"`  // 2009-06-24T14:48:23Z, 200...
            SetSpec    string `xml:"setSpec"`    // eppp:ART, eppp:ART, eppp:...
        } `xml:"header"`
        Metadata struct {
            Text    string `xml:",chardata"`
            Rfc1807 struct {
                Text           string   `xml:",chardata"`
                Xmlns          string   `xml:"xmlns,attr"`
                Xsi            string   `xml:"xsi,attr"`
                SchemaLocation string   `xml:"schemaLocation,attr"`
                BibVersion     string   `xml:"bib-version"`  // v2, v2, v2, v2, v2, v2, v...
                ID             string   `xml:"id"`           // http://journals.zpid.de/i...
                Entry          string   `xml:"entry"`        // 2009-06-24T14:48:23Z, 200...
                Organization   []string `xml:"organization"` // Proceedings of the Worksh...
                Title          string   `xml:"title"`        // Introduction and some Ide...
                Type           string   `xml:"type"`
                Author         []string `xml:"author"`       // KRAMPEN, Günter, CARBON,...
                Copyright      string   `xml:"copyright"`    // Das Urheberrecht liegt be...
                OtherAccess    string   `xml:"other_access"` // url:http://journals.zpid....
                Keyword        string   `xml:"keyword"`
                Period         []string `xml:"period"`
                Monitoring     string   `xml:"monitoring"`
                Language       string   `xml:"language"` // en, en, en, en, en, en, e...
                Abstract       string   `xml:"abstract"` // After a short description...
                Date           string   `xml:"date"`     // 2009-06-22 12:12:00, 2009...
            } `xml:"rfc1807"`
        } `xml:"metadata"`
        About string `xml:"about"`
    } `xml:"Record"`
}

Only consider a nested element

$ zek -t thesis < fixtures/z.xml
type Thesis struct {
	XMLName        xml.Name `xml:"thesis"`
	Text           string   `xml:",chardata"`
	Xmlns          string   `xml:"xmlns,attr"`
	Doc            string   `xml:"doc,attr"`
	Xsi            string   `xml:"xsi,attr"`
	SchemaLocation string   `xml:"schemaLocation,attr"`
	Title          []struct {
		Text string `xml:",chardata"`
	} `xml:"title"`
	Creator []struct {
		Text string `xml:",chardata"`
	} `xml:"creator"`
	Date []struct {
		Text string `xml:",chardata"`
	} `xml:"date"`
	Identifier []struct {
		Text string `xml:",chardata"`
	} `xml:"identifier"`
	Language []struct {
		Text string `xml:",chardata"`
	} `xml:"language"`
	Rights []struct {
		Text string `xml:",chardata"`
	} `xml:"rights"`
	Coverage []struct {
		Text string `xml:",chardata"`
	} `xml:"coverage"`
	Publisher []struct {
		Text string `xml:",chardata"`
	} `xml:"publisher"`
	Contributor []struct {
		Text string `xml:",chardata"`
	} `xml:"contributor"`
	Subject []struct {
		Text string `xml:",chardata"`
	} `xml:"subject"`
	Description []struct {
		Text string `xml:",chardata"`
	} `xml:"description"`
	Source struct {
		Text string `xml:",chardata"`
	} `xml:"source"`
	Type struct {
		Text string `xml:",chardata"`
	} `xml:"type"`
	Relation []struct {
		Text string `xml:",chardata"`
	} `xml:"relation"`
}

Inference across files

$ zek fixtures/a.xml fixtures/b.xml fixtures/c.xml
// A was generated 2017-12-05 17:40:14 by tir on apollo.
type A struct {
	XMLName xml.Name `xml:"a"`
	Text    string   `xml:",chardata"`
	B       []struct {
		Text string `xml:",chardata"`
	} `xml:"b"`
}

This is also useful, if you deal with archives containing XML files:

$ unzip -p 4082359.zip '*.xml' | zek -e

Given a directory full of zip files, you can combined find, unzip and zek:

$ for i in $(find ftp/b571 -type f -name "*zip"); do unzip -p $i '*xml'; done | zek -e

Another example (tarball with thousands of XML files, seemingly MARC):

$ tar -xOzf /tmp/20180725.125255.tar.gz | zek -e
// OAIPMH was generated 2018-09-26 15:03:29 by tir on sol.
type OAIPMH struct {
        XMLName        xml.Name `xml:"OAI-PMH"`
        Text           string   `xml:",chardata"`
        Xmlns          string   `xml:"xmlns,attr"`
        Xsi            string   `xml:"xsi,attr"`
        SchemaLocation string   `xml:"schemaLocation,attr"`
        ListRecords    struct {
                Text   string `xml:",chardata"`
                Record struct {
                        Text   string `xml:",chardata"`
                        Header struct {
                                Text       string `xml:",chardata"`
                                Identifier struct {
                                        Text string `xml:",chardata"` // aleph-publish:000000001, ...
                                } `xml:"identifier"`
                        } `xml:"header"`
                        Metadata struct {
                                Text   string `xml:",chardata"`
                                Record struct {
                                        Text           string `xml:",chardata"`
                                        Xmlns          string `xml:"xmlns,attr"`
                                        Xsi            string `xml:"xsi,attr"`
                                        SchemaLocation string `xml:"schemaLocation,attr"`
                                        Leader         struct
                                                Text string `xml:",chardata"` // 00001nM2.01200024
                                        } `xml:"leader"`
                                        Controlfield []struct {
                                                Text string `xml:",chardata"` // 00001nM2.01200024
                                                Tag  string `xml:"tag,attr"`
                                        } `xml:"controlfield"`
                                        Datafield []struct {
                                                Text     string `xml:",chardata"`
                                                Tag      string `xml:"tag,attr"`
                                                Ind1     string `xml:"ind1,attr"`
                                                Ind2     string `xml:"ind2,attr"`
                                                Subfield []struct {
                                                        Text string `xml:",chardata"` // KM0000002
                                                        Code string `xml:"code,attr"`
                                                } `xml:"subfield"`
                                        } `xml:"datafield"`
                                } `xml:"record"`
                        } `xml:"metadata"`
                } `xml:"record"`
        } `xml:"ListRecords"`
}

Misc

As a side effect, zek seems to be a useful for debugging. Example:

This record is emitted from a typical OAI server (OJS, not even uncommon), yet one can quickly spot the flaw in the structure.

Over 30 different struct generated manually in the course of a few hours (around five minutes per source): https://git.io/vbTDo.

-- Current extent leader: 1532 lines struct

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// UppercaseByDefault is used during XML tag name to Go name conversion.
	UppercaseByDefault = []string{"id", "Id", "isbn", "ismn", "json",
		"eissn", "issn", "http", "lccn", "rfc", "rsn", "uri", "url",
		"urn", "xml", "Xml", "zdb"}
	// DefaultTextFieldNames list struct field names for chardata, most preferred first.
	DefaultTextFieldNames = []string{"Text", "Chardata"}
	// DefaultAttributePrefixes are used, if there are name clashes.
	DefaultAttributePrefixes = []string{"Attr", "Attribute"}
)
View Source
var Version = "0.1.7"

Functions

func CreateNameFunc added in v0.1.2

func CreateNameFunc(upper []string) func(string) string

CreateNameFunc returns a function that converts a tag into a canonical Go name. Given list of strings will be wholly upper cased.

Types

type Node

type Node struct {
	Name        xml.Name   `json:"name,omitempty"`
	Attr        []xml.Attr `json:"attr,omitempty"`
	Examples    []string   `json:"examples,omitempty"`
	Children    []*Node    `json:"children,omitempty"`
	Freqs       []int      `json:"-"` // Collect number of occurences of this node within parent.
	MaxExamples int        `json:"-"` // Maximum number of examples to keep, gets passed to children.
	// contains filtered or unexported fields
}

Node represents an element in the XML tree. It keeps track of its name, attributes, childnodes and example chardata and basic statistics, e.g. how often a node has been seen within its parent node.

func (*Node) ByName added in v0.1.2

func (node *Node) ByName(name string) *Node

ByName finds a node in the tree by name. Comparisons start at the current node. First match is returned. If nothing matches, nil is returned.

func (*Node) CreateOrGetChild

func (node *Node) CreateOrGetChild(name xml.Name, attr []xml.Attr) *Node

CreateOrGetChild creates a child if no child with the same tag name exists, otherwise returns the existing node with that name. We want to collect node and attribute information for a node and not replicate the XML tree.

func (*Node) End

func (node *Node) End()

End signals end of an element.

func (*Node) Height

func (node *Node) Height() int

Height returns the height of the tree. A tree with zero nodes has height zero, a single node tree has height 1.

func (*Node) IsMultivalued

func (node *Node) IsMultivalued() bool

IsMultivalued returns true, if this node appeared more than once.

func (*Node) ReadFrom

func (node *Node) ReadFrom(r io.Reader) (int64, error)

ReadFrom reads XML from a reader.

func (*Node) ReadFromAll added in v0.1.2

func (node *Node) ReadFromAll(readers []io.Reader) (n int64, err error)

ReadFromAll builds a single node from all readers.

type Stack

type Stack struct {
	sync.Mutex
	// contains filtered or unexported fields
}

Stack is a simple stack for arbitrary types.

func (*Stack) Len

func (s *Stack) Len() int

Len returns number of items on the stack.

func (*Stack) Peek

func (s *Stack) Peek() interface{}

Peek returns the top element without removing it. Panic it stack is empty.

func (*Stack) Pop

func (s *Stack) Pop() interface{}

Pop item from stack. It's a panic if stack is empty.

func (*Stack) Put

func (s *Stack) Put(item interface{})

Put item onto stack.

type StructWriter

type StructWriter struct {
	NameFunc          func(string) string // Turns xml tag names into Go names.
	TextFieldNames    []string            // Field name for chardata.
	AttributePrefixes []string            // In case of a name clash, try these prefixes.
	WithComments      bool                // Annotate struct with comments and examples.
	Banner            string              // Autogenerated note.
	ExampleMaxChars   int                 // Max length of example comment.
	Strict            bool                // Whether to ignore implementation holes.
	WithJSONTags      bool                // Include JSON struct tags.
	Compact           bool                // Emit more compact struct.
	UniqueExamples    bool                // Filter out duplicated examples
	// contains filtered or unexported fields
}

StructWriter can turn a node into a struct and can be configured.

func NewStructWriter

func NewStructWriter(w io.Writer) *StructWriter

NewStructWriter can write a node to a given writer. Default list of abbreviations to wholly uppercase.

func (*StructWriter) WriteNode

func (sw *StructWriter) WriteNode(node *Node) (err error)

WriteNode writes a node to a writer.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL