ccda

command module
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 17, 2023 License: MIT Imports: 16 Imported by: 0

README

CCDA

Compile compact(ed) deterministic automata using graph files.

Usage

Use ccda [IN] OUT to compile the graph file IN to the compact(ed) automaton OUT. If IN is omitted, the graph file is read from stdin instead.

For the rewrite mode use ccda -R [-start] [-end] IN [FILE...]. In rewrite mode ccda rewrites input from FILE... to stdout using the compact(ed) automaton read from IN. If no files are given, the input is read from stdin. If -start and/or -end are used, the rewriting process handles the start and end context of rewritten sequences.

Graph file syntax

# Lines starting with `#` are comments; empty lines are ignored.

# State names
# 0 denotes the initial state; there cannot be more than one initial state.
# State names are strings without any whitespace. They cannot contain `@` and/or `\`
# and cannot start with `_`.
# Final states have to be marked using `#final NAME REWRITE`.

# Final states
# To mark the state NAME final and set its rewrite string use:
#final NAME final string data ...

# Transitions
# Transitions denote transitions from a state to the next.
# Use valid state name to reference different states.
# To denote a transition from SRC state to DST state accepting EXPR use:
SRC DST EXPR
# To denote an empty (automatic) transition from SRC to DST use:
SRC DST
# Note that any leading and/or subsequent whitespace around SRC, EXPR and DST
# are ignored.

# Replacements (macros)
# To denote the replacement of XXX with YYY in all EXPR use:
#define XXX YYY

# Special symbols
# The following symbols have a special meaning in EXPR:
# - `.()[]*+?\`
# To use any of the above symols literally in an expression,
# you have to escape them using `\`. This includes symbols
# in macros and within square brackets (`[...]`).

# Include graph files
# To include the contents of another file use:
#include /path/to/file

# Renaming state names
# To rename state names (or parts of state names) use:
#rename OLD NEW

##################################################
# Expressions (assuming a final state named `1`) #
##################################################

# Character classes
# Accepted language: A|B|...|Z|a|b|...|z
0 1 [A-Za-z]

# Negated character classes
# Accepts any sequence of characters without a or b
0 1 [^ab]*

# Dot accepts anything
# Accepts any sequence of characters.
0 1 .*

# One or more matches
# Accepted language: (a|b|...|z)(a|b|...|z)*
0 1 ([a-z])+

# Zero or more matches
# Accepted language: (a|b|...|z)*
0 1 ([a-z])*

# Optional matches
# Accepted language: (0|1|...|9)+(((.(0|1|...|9)+)|ε)
0 1 [0-9]+(.[0-9]+)?

# Combination of expressions
# Accepted language: (abc)*(0|1|...|9)+
0 1 (abc)*[0-9]+

# Empty transitions
# Accepted language: (a|b|...|z)*
0 1 [a-z]
1 0
0 1

# Macros
# Accepted language: (a|b|...|z)(0|1|...|9)+
#define <d> [0-9]
#define <l> [a-z]
0 2 <l>
2 1 <d>+

# Dictionaries
# Accepted language: abc|def
0 1 @dict
@dict abc
@dict def

# Accepted language: [0-9](abc|def)
0 1 [0-9]@dict
@dict abc
@dict def

# Accepted language: (abc|def)[0-9]
0 1 @dict[0-9]
@dict abc
@dict def

# Accepted language: (abc|def)ghi
0 1 (@dict)ghi
@dict abc
@dict def

# Escape syntax:
# Accepted language: ([|])*
0 1 ([\[\]])*

# Escape sequences
# Accepted language: iä🦖
0 1 \x69\u00e4\U0001F996

Documentation

Overview

This package compiles graph files describing a non-deterministic automaton to a compact(ed) deterministic automaton.

Usage ccda [IN] OUT Compiles the input graph file IN and writes the automaton to OUT. If IN is omitted, the graph file is read from stdin.

ccda -R [-start] [-end] IN [FILE...] Reads the compact(ed) automaton from IN and rewrites FILE... to stout using the automaton as rewrite lexicon. If no files are given, the input is read from stdin.

Graph file syntax

Lines starting with `#` are comments; empty lines are ignored.

# State names # 0 denotes the initial state; there cannot be more than one initial state. # State names are strings without any whitespace. They cannot contain `@` and/or `\` # and cannot start with `_`. # Final states have to be marked using `#final NAME REWRITE`.

# Final states # To mark the state NAME final and set its rewrite string use: #final NAME final string data ...

# Transitions # Transitions denote transitions from a state to the next. # Use valid state name to reference different states. # To denote a transition from SRC state to DST state accepting EXPR # use: SRC DST EXPR # To denote an empty (automatic) transition from SRC to DST use: SRC DST # Note that any leading and/or subsequent whitespace around SRC, EXPR and DST # are ignored.

# Replacements (macros) # To denote the replacement of XXX with YYY in all EXPR use: #define XXX YYY

# Special symbols # The following symbols have a special meaning in EXPR: # - `.()[]*+?\` # To use any of the above symols literally in an expression, # you have to escape them using `\`. This includes symbols # in macros and within square brackets (`[...]`).

# Include graph files # To include the contents of another file use: #include /path/to/file

# Renaming state names # To rename state names (or parts of state names) use: #rename OLD NEW

################################################## # Expressions (assuming a final state named `1`) # ##################################################

# Character classes # Accepted language: A|B|...|Z|a|b|...|z 0 1 [A-Za-z]

# Negated character classes # Accepts any sequence of characters without a or b 0 1 [^ab]*

# Dot accepts anything # Accepts any sequence of characters. 0 1 .*

# One or more matches # Accepted language: (a|b|...|z)(a|b|...|z)* 0 1 ([a-z])+

# Zero or more matches # Accepted language: (a|b|...|z)* 0 1 [a-z]*

# Optional matches # Accepted language: (0|1|...|9)+(((.(0|1|...|9)+)|ε) 0 1 [0-9]+(.[0-9]+)?

# Combination of expressions # Accepted language: (abc)*(0|1|...|9)+ 0 1 (abc)*[0-9]+

# Empty transitions # Accepted language: (a|b|...|z)* 0 1 [a-z] 1 0 0 1

# Macros # Accepted language: (a|b|...|z)(0|1|...|9)+ #define <d> [0-9] #define <l> [a-z] 0 2 <l> 2 1 <d>+

# Dictionaries # Accepted language: abc|def 0 1 @dict @dict abc @dict def

# Accepted language: [0-9](abc|def) 0 1 [0-9]@dict @dict abc @dict def

# Accepted language: (abc|def)[0-9] 0 1 @dict[0-9] @dict abc @dict def

# Accepted language: (abc|def)ghi 0 1 (@dict)ghi @dict abc @dict def

# Escape syntax: # Accepted language: ([|])* 0 1 ([\[\]])*

# Escape sequences: # Accepted language: iä🦖 0 1 \x69\u00e4\U0001F996

# Unicode classes: # unicode classes can be used in [...] expression or direct in # normal expressions. use \pN to refer to the unicode class N or # \p{NAME} to refer to the unicode class NAME.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL