jsonnlp

package module
v0.0.0-...-be08bdd Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 1, 2021 License: Apache-2.0 Imports: 2 Imported by: 0

README

GoJSONNLP - JSON-NLP Go Code

(C) 2020-2021 by Semiring Inc., Damir Cavar

Package version 0.8.4


Introduction

This repository provides the Go package jsonnlp for reading and writing JSON-NLP Schema conform data. JSON-NLP encodes outputs from Natural Language Processing (NLP) pipelines, functioning as some form of a middleware.

JSON-NLP wrappers for the output formats from various NLP pipelines are available:

Many other wrappers and modules likely exist or will be made available.

JSON-NLP processing and validation modules exist for other languages as well, as for example:


Installation

Install the jsonnlp Go package using:

go get github.com/SemiringInc/GoJSONNLP

Update to new version using:

go get -u github.com/SemiringInc/GoJSONNLP

Visualization of JSON-NLP

There is a visualizer for JSON-NLP available here: https://semiringinc.github.io/JSON-NLP-Viz/


Documentation

Overview

Package jsonnlp provides the data structures to read and generate JSON-NLP. See https://github.com/SemiringInc/JSON-NLP for the JSON Schema specification of the JSON-NLP exchange format.

JSON-NLP encapsulates different Natural Language Processing (NLP) annotations and analyses in one uniform JSON format.

Basic Structure Every JSON-NLP...

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Attribute

type Attribute struct {
	Label string `json:"lab"`
	Value string `json:"val"`
}

Attribute contains an attribute value tuple used in Entity and Relation specifications.

type Clause

type Clause struct {
	ID                   int     `json:"id"`                  // clause ID
	SentenceID           int     `json:"sentenceID"`          // sentence ID
	TokenFrom            int     `json:"tokenFrom,omitempty"` // first token
	TokenTo              int     `json:"tokenTo,omitempty"`   // last token
	Tokens               []int   `json:"tokens,omitempty"`    // list of tokens
	Main                 bool    `json:"main,omitempty"`      // is it a main clause
	Governor             int     `json:"gov,omitempty"`       // the id of the governing clause
	Head                 int     `json:"head,omitempty"`      // token ID of root/head (main verb or predicate head
	Negation             bool    `json:"neg,omitempty"`       // clause negated
	Tense                string  `json:"tense,omitempty"`     //
	Mood                 string  `json:"mood,omitempty"`      //
	Perfect              bool    `json:"perfect,omitempty"`
	Continuous           bool    `json:"continuous,omitempty"`
	Aspect               string  `json:"aspect,omitempty"`        //
	Voice                string  `json:"voice,omitempty"`         //
	Sentiment            string  `json:"sentiment,omitempty"`     //
	SentimentProbability float64 `json:"sentimentProb,omitempty"` //
}

Clause contains information about clause level properties.

type Conll

type Conll struct {
	Data string `json:"data,omitempty"`
}

Conll holds the CoNLL-U format data for the analyses

type ConstituentParse

type ConstituentParse struct {
	SentenceID        int     `json:"sentenceId"`
	Type              string  `json:"type,omitempty"`
	LabeledBracketing string  `json:"labeledBracketing"`
	Probability       float64 `json:"prob,omitempty"`
	Scopes            []Scope `json:"scopes,omitempty"`
}

ConstituentParse contains the syntactic constituent parse tree.

type Coreference

type Coreference struct {
	ID             int                        `json:"id"`
	Representative CoreferenceRepresentantive `json:"representative"`
	Referents      []CoreferenceReferents     `json:"referents"`
}

Coreference is a coreference between a representative element and a list of referents.

type CoreferenceReferents

type CoreferenceReferents struct {
	Tokens      []int   `json:"tokens"`
	Head        int     `json:"head,omitempty"`
	Probability float64 `json:"prob,omitempty"`
}

CoreferenceReferents contains a list of tokens indicating the referent and a respective head ID. There is an additional optional probability field that should indicate the likelyhood this referent refers to some R-expression.

type CoreferenceRepresentantive

type CoreferenceRepresentantive struct {
	Tokens []int `json:"tokens"`
	Head   int   `json:"head,omitempty"`
}

CoreferenceRepresentantive contains a list of tokens and the ID of the head. This is the referent for other refering expressions like anaphora.

type Dependency

type Dependency struct {
	Label       string  `json:"lab"`
	Governor    int     `json:"gov"`
	Dependent   int     `json:"dep"`
	Probability float64 `json:"prob,omitempty"`
}

Dependency tree encoding in JSON-NLP.

type DependencyTree

type DependencyTree struct {
	SentenceID   int          `json:"sentenceID"`
	Style        string       `json:"style,omitempty"`
	Dependencies []Dependency `json:"dependencies,omitempty"`
	Probability  float64      `json:"prob,omitempty"`
}

DependencyTree is a dependency tree is redefined compared to the original version of JSON-NLP.

type Document

type Document struct {
	MetaDocument    Meta               `json:"meta"`
	ID              int                `json:"id"`
	TokenList       []Token            `json:"tokenList,omitempty"`
	Clauses         []Clause           `json:"clauses,omitempty"`
	Sentences       []Sentence         `json:"sentences,omitempty"`
	Paragraphs      []Paragraph        `json:"paragraphs,omitempty"`
	DependencyTrees []DependencyTree   `json:"dependencyTrees,omitempty"`
	Coreferences    []Coreference      `json:"coreferences,omitempty"`
	Constituents    []ConstituentParse `json:"constituents,omitempty"`
	Expressions     []Expression       `json:"expressions,omitempty"`
	Entities        []Entity           `json:"entities,omitempty"`
	Relations       []Relation         `json:"relations,omitempty"`
	Triples         []Triple           `json:"triples,omitempty"`
}

Document is a structure that contains an ID, Meta information, and all the different linguistic annotations.

type Entity

type Entity struct {
	ID                   int         `json:"id"`
	Label                string      `json:"label,omitempty"`
	Type                 string      `json:"type"`
	URL                  string      `json:"url"`
	Head                 int         `json:"head,omitempty"`
	TokenFrom            int         `json:"tokenFrom,omitempty"`
	TokenTo              int         `json:"tokenTo,omitempty"`
	Tokens               []int       `json:"tokens,omitempty"`
	TripleID             int         `json:"tripleID,omitempty"`      // reified entity pointer to triple ID
	Sentiment            string      `json:"sentiment,omitempty"`     //
	SentimentProbability float64     `json:"sentimentProb,omitempty"` //
	Count                int         `json:"count,omitempty"`
	Attributes           []Attribute `json:"attributes"`
}

Entity contains detailed information about entities in the sentence or clause.

type Expression

type Expression struct {
	ID          int     `json:"id"`
	Type        string  `json:"type,omitempty"` // "NP"
	Head        int     `json:"head,omitempty"`
	Dependency  string  `json:"dependency,omitempty"` // "nsubj"
	TokenFrom   int     `json:"tokenFrom,omitempty"`  // first token
	TokenTo     int     `json:"tokenTo,omitempty"`    // last token
	Tokens      []int   `json:"tokens"`
	Probability float64 `json:"prob,omitempty"`
}

Expression stores expressions, which mostly corresponds to chunks, that is phrases.

type JSONNLP

type JSONNLP struct {
	MetaData  Meta       `json:"meta,omitempty"`
	Documents []Document `json:"documents,omitempty"`
	CoNLL     Conll      `json:"conll,omitempty"`
}

JSONNLP is a tuple of Meta information and a list of documents.

func (*JSONNLP) FromFile

func (data *JSONNLP) FromFile(filename string)

FromFile reads the JSON-NLP instance from a file.

func (*JSONNLP) FromString

func (data *JSONNLP) FromString(t string)

FromString reads the JSON-NLP instance from a string.

func (*JSONNLP) GetJSON

func (data *JSONNLP) GetJSON() ([]byte, error)

GetJSON returns the JSON-NLP instance as a byte array.

type Meta

type Meta struct {
	DCConformsTo   string     `json:"DC.conformsTo"`
	DCAuthor       string     `json:"DC.author"`             //
	DCCreated      string     `json:"DC.created"`            // "2020-05-28T02:15:19"
	DCDate         string     `json:"DC.date,omitempty"`     // "2020-05-28T02:15:19"
	DCSource       string     `json:"DC.source,omitempty"`   // "NLP1 2.2.3"
	DCLanguage     string     `json:"DC.language,omitempty"` // "en"
	DCCreator      string     `json:"DC.creator,omitempty"`
	DCPublisher    string     `json:"DC.publisher,omitempty"`
	DCTitle        string     `json:"DC.title,omitempty"`
	DCDescription  string     `json:"DC.description,omitempty"`
	DCIdentifier   string     `json:"DC.identifier,omitempty"`
	DCSubject      string     `json:"DC.subject,omitempty"`
	DCContributors string     `json:"DC.contributors,omitempty"`
	DCType         string     `json:"DC.type,omitempty"`
	DCFormat       string     `json:"DC.format,omitempty"`
	DCRelation     string     `json:"DC.relation,omitempty"`
	DCCoverage     string     `json:"DC.coverage,omitempty"`
	DCRights       string     `json:"DC.rights,omitempty"`
	Counts         MetaCounts `json:"counts,omitempty"`
}

Meta contains the common meta information for the entire JSON-NLP or a single document. These are Dublin Core (DC) labels. See the DC documentation for details.

type MetaCounts

type MetaCounts struct {
	Paragraphs int `json:"paragraphs,omitempty"`
	Sentences  int `json:"sentences,omitempty"`
	Clauses    int `json:"clauses,omitempty"`
	Tokens     int `json:"tokens,omitempty"`
}

MetaCounts contains various statistics about the JSON-NLP, including document count, number of paragraphs, sentences, clauses, tokens

type Paragraph

type Paragraph struct {
	ID        int   `json:"id"`
	TokenFrom int   `json:"tokenFrom,omitempty"`
	TokenTo   int   `json:"tokenTo,omitempty"`
	Tokens    []int `json:"tokens,omitempty"`
	Sentences []int `json:"sentences,omitempty"`
}

Paragraph contains the information about paragraphs.

type Relation

type Relation struct {
	ID                   int         `json:"id"`
	Label                string      `json:"label"`
	Type                 string      `json:"type"`
	URL                  string      `json:"url"`
	Head                 int         `json:"head,omitempty"`
	TokenFrom            int         `json:"tokenFrom,omitempty"`
	TokenTo              int         `json:"tokenTo,omitempty"`
	Tokens               []int       `json:"tokens,omitempty"`
	Sentiment            string      `json:"sentiment,omitempty"`     //
	SentimentProbability float64     `json:"sentimentProb,omitempty"` //
	Count                int         `json:"count,omitempty"`
	Attributes           []Attribute `json:"attributes"`
}

Relation encodes specific relations that can be specified between entities.

type Scope

type Scope struct {
	ID         int   `json:"id"`
	Governor   []int `json:"gov"`
	Dependents []int `json:"dep,omitempty"`
	Terminals  []int `json:"terminals,omitempty"`
}

Scope indicates the scope relations between a governor and dependents (including potentially terminals, which is words).

type Sentence

type Sentence struct {
	ID                   int     `json:"id"`                      // sentence ID
	TokenFrom            int     `json:"tokenFrom,omitempty"`     // first token
	TokenTo              int     `json:"tokenTo,omitempty"`       // last token
	Tokens               []int   `json:"tokens,omitempty"`        // list of tokens in sentence
	Clauses              []int   `json:"clauses,omitempty"`       // list of clauses in sentence
	Type                 string  `json:"type,omitempty"`          // type of sentence: declarative, interrogative, exclamatory, imperative, instructive
	Sentiment            string  `json:"sentiment,omitempty"`     // sentiment type
	SentimentProbability float64 `json:"sentimentProb,omitempty"` //
}

Sentence is a new structure compared to the original JSON-NLP version.

type Token

type Token struct {
	ID                   int           `json:"id"`
	SentenceID           int           `json:"sentence_id"`
	Text                 string        `json:"text"`            // "John",
	Lemma                string        `json:"lemma,omitempty"` // "John",
	XPoS                 string        `json:"xpos,omitempty"`  // "NNP",
	XPoSProbability      float64       `json:"xpos_prob,omitempty"`
	UPoS                 string        `json:"upos,omitempty"` // "PROPN",
	UPoSProbability      float64       `json:"upos_prob,omitempty"`
	EntityIOB            string        `json:"entity_iob,omitempty"` // "B",
	CharacterOffsetBegin int           `json:"characterOffsetBegin,omitempty"`
	CharacterOffsetEnd   int           `json:"characterOffsetEnd,omitempty"`
	PropID               string        `json:"propID,omitempty"`            // PropBank ID
	PropIDProbability    float64       `json:"propIDProbability,omitempty"` // PropBank ID probability
	FrameID              int           `json:"frameID,omitempty"`
	FrameIDProbability   float64       `json:"frameIDProb,omitempty"`
	WordNetID            int           `json:"wordNetID,omitempty"`
	WordNetIDProbability float64       `json:"wordNetIDProb,omitempty"`
	VerbNetID            int           `json:"verbNetID,omitempty"`
	VerbNetIDProbability float64       `json:"verbNetIDProb,omitempty"`
	Lang                 string        `json:"lang,omitempty"`     // "en",
	Features             TokenFeatures `json:"features,omitempty"` //
	Shape                string        `json:"shape,omitempty"`    // "Xxxx",
	Entity               string        `json:"entity,omitempty"`   // "PERSON"
}

Token structure contains all the token spoecific details.

type TokenFeatures

type TokenFeatures struct {
	Overt          bool   `json:"overt,omitempty"`       // is the token overt? Invisible or covert words are words that are omitted in speech, subject to ellipsis, gapping, simple object, topic, or subject drop, etc.
	Stop           bool   `json:"stop,omitempty"`        // is the token a stop-word or not?
	Alpha          bool   `json:"alpha,omitempty"`       //
	Number         int    `json:"number,omitempty"`      // 1 = singular, 2 = dual, 3 or more = plural
	Gender         string `json:"gender,omitempty"`      // male, female, neuter
	Person         int    `json:"person,omitempty"`      // 1, 2, 3
	Tense          string `json:"tense,omitempty"`       // Tense of the token: past, present, future
	Perfect        bool   `json:"perfect,omitempty"`     // Aspect of the token
	Continuous     bool   `json:"continuous,omitempty"`  // is the token indicating continuous = ing
	Progressive    bool   `json:"progressive,omitempty"` // is the token indicating progressive = am + ...ing
	Case           string `json:"case,omitempty"`        // nom, acc, dat, gen, voc, loc, inst, ...
	Human          bool   `json:"human,omitempty"`       // yes/no
	Animate        bool   `json:"animate,omitempty"`     // yes/no
	Negated        bool   `json:"negated,omitempty"`     // word in scope og negation
	Countable      bool   `json:"countable,omitempty"`
	Factive        bool   `json:"factive,omitempty"` // factive verb
	Counterfactive bool   `json:"counterfactive,omitempty"`
	Irregular      bool   `json:"irregular,omitempty"` // irregular verb or noun form
	PhrasalVerb    bool   `json:"phrasalVerb,omitempty"`
	Mood           string `json:"mood,omitempty"` // indicative, imperative, subjunctive
	Foreign        bool   `json:"foreign,omitempty"`
	SpaceAfter     bool   `json:"spaceAfter,omitempty"` // space after token in orig text?
}

TokenFeatures is a data structure that containes all the detailed morphosyntactic token features.

type Triple

type Triple struct {
	ID               int     `json:"id"`
	FromEntity       int     `json:"fromEntity"`
	ToEntity         int     `json:"toEntity"`
	Relation         int     `json:"rel"`
	ClauseID         []int   `json:"clauseID,omitempty"`
	SentenceID       []int   `json:"sentenceID,omitempty"`
	Directional      bool    `json:"directional,omitempty"`
	EventID          int     `json:"eventID,omitempty"`
	TemporalSequence int     `json:"tempSeq,omitempty"`
	Probability      float64 `json:"prob,omitempty"`
	Syntactic        bool    `json:"syntactic,omitempty"`
	Implied          bool    `json:"implied,omitempty"`
	Presupposed      bool    `json:"presupposed,omitempty"`
	Count            int     `json:"count,omitempty"`
}

Triple contains a specific relation between two entities.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL