q

package
v38.0.0+incompatible Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 23, 2020 License: MIT Imports: 13 Imported by: 2

Documentation

Overview

Package q is the gedcomq parser and engine.

Language Basics

The query is split into expressions. The pipe (|) indicates that the result of one expression is the input into the next expression.

The starting expression is the gedcom.Document itself that is passed into the first expression (".Individuals" in the example above).

".Individuals" is called an "accessor", denoted by the "." prefix. An accessor will try to find a property or method of that name, returning the value of the property or the result of invoking the method. The above example would return a slice ([]*IndividualNode).

The next expression, ".Name" receives that slice. Since it is a slice the ".Name" accessor is performed on each of the individual slice members, creating a new slice with the results. In this case IndividualNode has a method called Name that returns a *NameNode. That means that result of the processing the slice will be []*NameNode.

After all of the expressions have been evaluated the result is encoded into JSON and output.

It's important to note that some structures implement the json.Marshaller interface which controls how the structure is represented in JSON. Many structures also implement fmt.Stringer (the String method) which can be helpful for seeing more simple representations of values.

With the example ".Individuals | .Name" on a document that contains two individuals:

[
  {
    "Nodes": [
      {
        "Tag": "GIVN",
        "Value": "Lucy Alcott"
      },
      {
        "Tag": "SURN",
        "Value": "Chauncey"
      }
    ],
    "Tag": "NAME",
    "Value": "Lucy Alcott /Chauncey/"
  },
  {
    "Nodes": [
      {
        "Tag": "GIVN",
        "Value": "Sarah"
      },
      {
        "Tag": "SURN",
        "Value": "Taylor"
      }
    ],
    "Tag": "NAME",
    "Value": "Sarah /Taylor/"
  }
]

If this is too verbose for you, here is the same output using ".Individuals | .Name | .String":

[
  "Lucy Alcott Chauncey",
  "Sarah Taylor"
]

Functions

Some functions are provided as part of the gedcomq language that exist outside of the gedcom package:

Combine(Slices...)

Combine will combine multiple slices of the same type into a single slice.

First(number)

First returns up to the number of elements in a slice.

If the input value is not a slice then it is converted into a slice of one element before evaluating. This means that the result will always be a slice. The only exception to this is if the input is nil, then the result will also be nil.

There must be exactly one argument and it must be 0 or greater. If the number is greater than the length of the slice all elements are returned.

Last(number)

Last returns up to the number of elements in a slice.

If the input value is not a slice then it is converted into a slice of one element before evaluating. This means that the result will always be a slice. The only exception to this is if the input is nil, then the result will also be nil.

There must be exactly one argument and it must be 0 or greater. If the number is greater than the length of the slice all elements are returned.

Length

Length returns an integer with the number of items in the slice.

This value will be 0 or more. If the input is not a slice then 1 will always be returned.

MergeDocumentsAndIndividuals(doc1, doc2)

Merges two documents while also merging similar individuals.

Only(condition)

The Only function returns a new slice that only contains the entities that have returned true from the condition. For example:

.Individuals | Only(.Age > 100)

The Question Mark

"?" is a special function that can be used to show all of the possible next functions and accessors. This is useful when exploring data by creating the query interactively.

For example the following query:

.Individuals | ?

Returns (most items removed for brevity):

[
  ".AddNode",
  ".Age",
  ".AgeAt",
  ...
  ".SurroundingSimilarity",
  ".Tag",
  ".Value",
  "?",
  "Length"
]

Variables

Variables allow more complex logic to be processed in separate discreet steps. It also applies in cases where the logic would normally be duplicated if it couldn't be referenced from multiple places.

Variable are defined in on of the two forms:

Events are .Individuals | .AllEvents
Name is .Individual | .Name

The keywords "are" and "is" do exactly the same thing. They are both offered to make the semantics of reading the expression easier.

Variables can then be references in separate expressions. For example the following:

.Individuals | .Name | .String

Could also be written as:

Names are .Individuals | .Name; Names | .String

Or even more verbosely as:

Indi is .Individuals; Names are Indi | .Name; Names | .String

The semicolon (;) is used to separate variable definitions. The result returned will always be the return value of the last statement.

Available variables will be shown as options with the special Question Mark function.

Data Types

gedcomq does not define strict data types. Instead it will perform an operation as best it can under the conditions provided.

To help simplify things here are general descriptions of how certain data types are handled:

- Numbers can be actual whole of floating-point numbers, or they can also be represented as a string. For example 1.23 and "1.230" are considered equal because they both represent the same numerical value, even though they are in different forms.

- Strings are text of any length (including zero characters). If it's value represents a number, such as "123" or "4.56" it will change the behaviour of the operator used on it because they will be treated as numbers rather than text. It's also very important to note that strings are compared internally without case-sensitivity and whitespace that exists at the start or end of the string will be ignore. For example "John Smith" is considered to be equal to " john SMITH ".

- Slices are an ordered set of items, often also called an "array". The name was chosen as "slice" rather than "array" because it is more inline with the description of types in Go. A slice may contain zero elements but if it does have items they will almost certainly be of the same type. Such as a slice of individuals.

- Objects (sometimes referred to as a "map" or "dictionary") consists as a zero or more key-value pairs. The values may be of any type, but the keys are always strings and always unique in that object. Objects may be generic, or they may be a specific type from the gedcom package. If they are a specific type, such as an IndividualNode they may also have methods available which can be accessed just like properties.

Operators

gedcomq supports several binary operators that can be used for comparison of values. All operators will return a boolean (true/false) result:

=  (equal)
!= (not equal)

If the left and right both represent numeric values then the values are compared numerically. That is to say 1.23 and "1.2300" are equal.

If either the left or right is not a number then the values are compared without case and any whitespace at the start or end is ignore. This means that "John Smith" is considered to be equal to " john SMITH ", but not equal to "John Smith".

Not equal works exactly opposite.

>  (greater than)
>= (greater than or equal)
<  (less than)
>= (less than or equal)

If the left and right both represent numeric values then the values are compared numerically. That is to say 1.2301 is greater than "1.23".

If the left or right does not represent a numeric value then the values are compared as strings using the same case-insensitive rules as "=".

One string is greater than another string by comparing each of the characters. So "Jon" is greater than "John" because "n" is greater than "h".

Creating Objects

Custom objects can be constructed on one more items. For example:

.Individuals | { name: .Name | .String, born: .Birth | .String }

May output something similar to:

[
  {
    "born": "1863",
    "name": "Charles W Chauncey"
  },
  {
    "born": "12 Dec 1859",
    "name": "Lucy Alcott Chauncey"
  },
  {
    "born": "1831",
    "name": "Sarah Taylor"
  }
]

It's also worth noting that object can contain zero key-value pairs, such as:

.Individuals | {}

This would output (using the same individuals in the previous example):

[
  {},
  {},
  {}
]

Also see the Examples below.

Outputting In Other Formats

There are several formatters (see Formatter interface) that allow the result of a query to be output in different ways. Such as pretty json or CSV.

This can be controlled with the "-format" option with gedcomq, or by instantiating one of the formatter instances in your own code.

Examples

Count all individuals in a document:

.Individuals | Length

result:

3401

Retrieve the basic details of the first 3 individuals:

.Individuals | First(3) | { name: .Name | .String, born: .Birth | .String, died: .Death | .String}

result:

[
  {
    "born": "6 Dec 1636",
    "died": "2 Dec 1713",
    "name": "Gershom Bulkeley"
  },
  {
    "born": "5 Nov 1592",
    "died": "19 Feb 1672",
    "name": "Charles Chauncey"
  },
  {
    "born": "1408",
    "died": "7 May 1479",
    "name": "John Chauncy Esq."
  },
]

Retrieve the names of individuals that have a given name (first name) of "John".

.Individuals | .Name | Only(.GivenName = "John") | .String

result:

[
  "John Chaunce",
  "John Chaunce",
  "John Chance",
  "John Unett",
  "John Chance",
  "John de Chauncy",
]

Find all of the living people with their current age:

.Individuals | Only(.IsLiving) | { name: .Name | .String, age: .Age | .String}

result:

[
  {
    "age": "82y 6m",
    "name": "Robert Walter Chance"
  },
  {
    "age": "~ 90y 10m",
    "name": "Sir Robert Temple Armstrong"
  },
]

Merge two GEDCOM files (full command):

gedcomq -gedcom file1.ged -gedcom file2.ged -format gedcom \
  'MergeDocumentsAndIndividuals(Document1, Document2)' > merged.ged

Index

Constants

View Source
const (
	// Special
	TokenEOF = TokenKind("EOF")

	// Ignored
	TokenWhitespace = TokenKind("whitespace")

	// Words
	TokenAccessor = TokenKind("accessor")
	TokenWord     = TokenKind("word")
	TokenNumber   = TokenKind("number")
	TokenString   = TokenKind("string")

	// Operators
	TokenPipe         = TokenKind("|")
	TokenSemiColon    = TokenKind(";")
	TokenQuestionMark = TokenKind("?")
	TokenOpenBracket  = TokenKind("(")
	TokenCloseBracket = TokenKind(")")
	TokenOpenCurly    = TokenKind("{")
	TokenCloseCurly   = TokenKind("}")
	TokenColon        = TokenKind(":")
	TokenComma        = TokenKind(",")
	TokenEqual        = TokenKind("=")
	TokenNot          = TokenKind("!")
	TokenGreaterThan  = TokenKind(">")
	TokenLessThan     = TokenKind("<")
)

Variables

View Source
var Functions = map[string]Expression{
	"?":                            &QuestionMarkExpr{},
	"Combine":                      &CombineExpr{},
	"First":                        &FirstExpr{},
	"Last":                         &LastExpr{},
	"Length":                       &LengthExpr{},
	"MergeDocumentsAndIndividuals": &MergeDocumentsAndIndividualsExpr{},
	"Only":                         &OnlyExpr{},
}

Functions is a map of available functions.

See "Functions" in the package documentation for usage and examples.

View Source
var Operators = []struct {
	Name     string
	Tokens   []TokenKind
	Function func(left, right interface{}) (bool, error)
}{
	{"!=", []TokenKind{TokenNot, TokenEqual}, notEqual},
	{">=", []TokenKind{TokenGreaterThan, TokenEqual}, greaterThanEqual},
	{"<=", []TokenKind{TokenLessThan, TokenEqual}, lessThanEqual},
	{"=", []TokenKind{TokenEqual}, equal},
	{">", []TokenKind{TokenGreaterThan}, greaterThan},
	{"<", []TokenKind{TokenLessThan}, lessThan},
}

Operators contains the tokens and functions for all operators.

It is important that the operators are ordered so that the operators with most tokens are read first. This prevents it from consuming operators that are subsets of others.

Functions

func TypeOfSliceElement

func TypeOfSliceElement(v interface{}) reflect.Type

TypeOfSliceElement returns the type of element from a slice. The input should not be a reflect.Value, but an actual value.

If v is not a slice then nil is returned.

func ValueToPointer

func ValueToPointer(v reflect.Value) reflect.Value

ValueToPointer converts a value to a pointer. If the value is already a pointer then it is passed through.

Types

type AccessorExpr

type AccessorExpr struct {
	Query string
}

AccessorExpr is used to fetch the value of a property or to invoke a method.

The simplest form is ".Foo" where Foo could be a property or method.

When an accessor is used on a slice the accessor is performed on each element, generating a new slice of that returned type.

func (*AccessorExpr) Evaluate

func (e *AccessorExpr) Evaluate(engine *Engine, input interface{}, args []*Statement) (interface{}, error)

Evaluate will automatically handle conversions between pointer and non-pointers to find the property or method and return the value. However, if it is a method it must not take any arguments.

It will return an error if a property or method could not be found by that name.

type BinaryExpr

type BinaryExpr struct {
	Left, Right Expression
	Operator    string
}

BinaryExpr evaluates a binary operator expression.

func (*BinaryExpr) Evaluate

func (e *BinaryExpr) Evaluate(engine *Engine, input interface{}, args []*Statement) (interface{}, error)

type CSVFormatter

type CSVFormatter struct {
	Writer io.Writer
}

func (*CSVFormatter) Header

func (f *CSVFormatter) Header(result interface{}) ([]string, error)

func (*CSVFormatter) Write

func (f *CSVFormatter) Write(result interface{}) error

type CallExpr

type CallExpr struct {
	Function Expression
	Args     []*Statement
}

CallExpr calls a function.

func (*CallExpr) Evaluate

func (e *CallExpr) Evaluate(engine *Engine, input interface{}, args []*Statement) (interface{}, error)

type CombineExpr

type CombineExpr struct{}

CombineExpr will combine multiple slices of the same type into a single slice.

If the slices are not the same type an error will be returned with a nil value.

func (*CombineExpr) Evaluate

func (e *CombineExpr) Evaluate(engine *Engine, input interface{}, args []*Statement) (interface{}, error)

type ConstantExpr

type ConstantExpr struct {
	Value string
}

ConstantExpr represents a floating-point number or string.

func (*ConstantExpr) Evaluate

func (e *ConstantExpr) Evaluate(engine *Engine, input interface{}, args []*Statement) (interface{}, error)

type Engine

type Engine struct {
	Statements []*Statement
}

Engine is the compiled query. It is able to evaluate the entire query.

func (*Engine) Evaluate

func (e *Engine) Evaluate(documents []*gedcom.Document) (interface{}, error)

Evaluate executes all of the expressions and returns the final result.

Evaluate expects that there is at least one document provided.

func (*Engine) StatementByVariableName

func (e *Engine) StatementByVariableName(name string) (*Statement, error)

type Expression

type Expression interface {
	// Evaluate should only be run once and is likely to alter the value of
	// input. This means expressions can only be safely run once and previous
	// input values cannot be reused.
	Evaluate(engine *Engine, input interface{}, args []*Statement) (interface{}, error)
}

Expression is a single operation. Expressions can be chained together with a pipe (|) in the query.

type FirstExpr

type FirstExpr struct{}

FirstExpr is a function. See Evaluate.

func (*FirstExpr) Evaluate

func (e *FirstExpr) Evaluate(engine *Engine, input interface{}, args []*Statement) (interface{}, error)

Evaluate returns up to the number of elements in a slice.

If the input value is not a slice then it is converted into a slice of one element before evaluating. This means that the result will always be a slice. The only exception to this is if the input is nil, then the result will also be nil.

There must be exactly one argument and it must be 0 or greater. If the number is greater than the length of the slice all elements are returned.

type Formatter

type Formatter interface {
	Write(result interface{}) error
}

Formatter is used to write the result to stream.

type GEDCOMFormatter

type GEDCOMFormatter struct {
	Writer io.Writer
}

func (*GEDCOMFormatter) Write

func (f *GEDCOMFormatter) Write(result interface{}) error

type HTMLFormatter

type HTMLFormatter struct {
	Writer io.Writer
}

func (*HTMLFormatter) Write

func (f *HTMLFormatter) Write(result interface{}) error

type JSONFormatter

type JSONFormatter struct {
	Writer io.Writer
}

func (*JSONFormatter) Write

func (f *JSONFormatter) Write(result interface{}) error

type LastExpr

type LastExpr struct{}

LastExpr is a function. See Evaluate.

func (*LastExpr) Evaluate

func (e *LastExpr) Evaluate(engine *Engine, input interface{}, args []*Statement) (interface{}, error)

Evaluate returns up to the number of last elements in a slice.

If the input value is not a slice then it is converted into a slice of one element before evaluating. This means that the result will always be a slice. The only exception to this is if the input is nil, then the result will also be nil.

There must be exactly one argument and it must be 0 or greater. If the number is greater than the length of the slice all elements are returned.

type LengthExpr

type LengthExpr struct{}

LengthExpr is a function. See Evaluate.

func (*LengthExpr) Evaluate

func (e *LengthExpr) Evaluate(engine *Engine, input interface{}, args []*Statement) (interface{}, error)

Evaluate returns an integer with the number of items in the slice. This value will be 0 or more. If the input is not a slice then 1 will always be returned.

type MergeDocumentsAndIndividualsExpr

type MergeDocumentsAndIndividualsExpr struct{}

MergeDocumentsAndIndividualsExpr is a function. See Evaluate.

func (*MergeDocumentsAndIndividualsExpr) Evaluate

func (e *MergeDocumentsAndIndividualsExpr) Evaluate(engine *Engine, input interface{}, args []*Statement) (interface{}, error)

Evaluate merges two documents while also merging similar individuals.

type ObjectExpr

type ObjectExpr struct {
	Data map[string]*Statement
}

ObjectExpr creates an object from keys and values.

func (*ObjectExpr) Evaluate

func (e *ObjectExpr) Evaluate(engine *Engine, input interface{}, args []*Statement) (interface{}, error)

type OnlyExpr

type OnlyExpr struct{}

OnlyExpr is a function. See Evaluate.

func (*OnlyExpr) Evaluate

func (e *OnlyExpr) Evaluate(engine *Engine, input interface{}, args []*Statement) (interface{}, error)

Evaluate returns a new slice that only contains the entities that have returned true from the condition.

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

Parser converts the query string into an Engine that can be evaluated.

func NewParser

func NewParser() *Parser

NewParser creates a new parser.

func (*Parser) ParseString

func (p *Parser) ParseString(q string) (engine *Engine, err error)

ParseString returns a new Engine by parsing the query string.

type PrettyJSONFormatter

type PrettyJSONFormatter struct {
	Writer io.Writer
}

func (*PrettyJSONFormatter) Write

func (f *PrettyJSONFormatter) Write(result interface{}) error

type QuestionMarkExpr

type QuestionMarkExpr struct{}

QuestionMarkExpr ("?") is a special function. See Evaluate.

func (*QuestionMarkExpr) Evaluate

func (e *QuestionMarkExpr) Evaluate(engine *Engine, input interface{}, args []*Statement) (interface{}, error)

"?" is a special function that can be used to show all of the possible next functions and accessors. This is useful when exploring data by creating the query interactively.

For example the following query:

.Individuals | ?

Returns (most items removed for brevity):

[
  ".AddNode",
  ".Age",
  ".AgeAt",
  ...
  ".SurroundingSimilarity",
  ".Tag",
  ".Value",
  "?",
  "Length"
]

type Statement

type Statement struct {
	// VariableName must be unique amongst other variables and must not be the
	// name of an existing function. The name is also allow to be empty which
	// means that the result cannot be referenced in other expressions.
	VariableName string

	// Expressions are separated by pipes. The result of each evaluated
	// expressions is used as the input to the next expressions. The input value
	// for the first expression is the gedcom.Document.
	Expressions []Expression
}

Statement represents a single discreet operation in the engine.

func (*Statement) Evaluate

func (v *Statement) Evaluate(engine *Engine, input interface{}) (interface{}, error)

Evaluate executes all of the expressions and returns the final result.

type Token

type Token struct {
	Kind  TokenKind
	Value string
}

type TokenKind

type TokenKind string

type Tokenizer

type Tokenizer struct{}

func NewTokenizer

func NewTokenizer() *Tokenizer

func (*Tokenizer) TokenizeString

func (t *Tokenizer) TokenizeString(s string) *Tokens

type Tokens

type Tokens struct {
	Tokens   []Token
	Position int
}

func (*Tokens) Consume

func (t *Tokens) Consume(expected ...TokenKind) (tokens []Token, err error)

func (*Tokens) Rollback

func (t *Tokens) Rollback(position int, err *error)

type ValueExpr

type ValueExpr struct {
	Value interface{}
}

ValueExpr holds a single value.

It is different from ConstantExpr because it cannot be instantiated from the q language, but acts as a placeholder for prepared values.

func (*ValueExpr) Evaluate

func (e *ValueExpr) Evaluate(engine *Engine, input interface{}, args []*Statement) (interface{}, error)

type VariableExpr

type VariableExpr struct {
	Name string
}

func (*VariableExpr) Evaluate

func (e *VariableExpr) Evaluate(engine *Engine, input interface{}, args []*Statement) (interface{}, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL