Documentation ¶
Overview ¶
Package alud derives Universal Dependencies from sentences parsed with Alpino.
Usually, the input is XML in the alpino_ds format.
The output is in the CoNLL-U format, or the Universal Dependencies can be embedded into the alpino_ds format (version 1.10), making them available for XPath queries.
It is also possible to embed a user provided file in the CoNLL-U format, and embed this into the alpino_ds format.
When empty heads are reconstructed (resulting in lines with an ID with a dot), the ID of the original line is added in the last field of the CoNLL-U format, in the form CopiedFrom=ID. This information is necessary for correct embedding into the alpino_ds format.
----
The package is based on a translation of an xquery script written by Gosse Bouma.
See Alpino: https://www.let.rug.nl/vannoord/alp/Alpino/
See Universal Dependencies: https://universaldependencies.org/
See CoNLL-U: https://universaldependencies.org/format.html
See xquery script: https://github.com/gossebouma/lassy2ud
Index ¶
Constants ¶
const ( OPT_DEBUG = 1 << iota // include debug messages in comments OPT_DUMMY_OUTPUT // include dummy output if parse fails OPT_NO_COMMENTS // don't include comments OPT_NO_DETOKENIZE // don't try to restore detokenized sentence OPT_NO_ENHANCED // skip enhanced dependencies OPT_NO_FIX_MISPLACED_HEADS // don't fix misplaced heads in coordination OPT_NO_FIX_PUNCT // don't fix punctuation OPT_NO_METADATA // don't copy metadata to comments OPT_PANIC // panic on error (for development) )
options can be or'ed as last argument to Ud()
Variables ¶
This section is empty.
Functions ¶
func Alpino ¶
Insert given Universal Dependencies into alpino_ds format.
Use UD info from alpino_doc if conllu is "".
The conllu format is not checked for correctness. Garbage in, garbage out.
The value from auto is copied to the output.
func Ud ¶
Derive Universal Dependencies from parsed sentence in alpino_ds format.
If sentid is "" it is derived from the filename.