Documentation
¶
Overview ¶
Package mcfile defines a per-file structure [MCFile] that holds all relevant per-file information. This includes:
- file path info
- file content (UTF-8, tietysti)
- file type information (MIME and more)
- the results of markup-specific file analysis (in the most analysable case, i.e. XML, this comprises tokens, gtokens, gelms, gtree)
For a discussion of tree walk functions, see `doc_wfn.go`
Note that if we do not get an explicit XML DOCTYPE declaration, there is some educated guesswork required.
The first workflow was based on XML, and comprises: `text => XML tokens => GTokens => GTags => GTree`
First, package `gparse` gets as far as the `GToken`s, which can only be in a list: they have no tree structure. Then package `gtree` handles the rest.
XML analysis starts off with tokenization (by the stdlib), so it makes sense to then have separate steps for making `GToken's, GTag's, GTree`. <br/> MKDN and HTML analyses use higher-level libraries that deliver CSTs (Concrete Syntax Tree, i.e. parse tree). We choose to do this processing in `package gparse` rather than in `package gtree`.
MKDN gets a tree of `yuin/goldmark/ast/Node`, and HTML gets a tree of stdlib `golang.org/x/net/html/Node`. Since a CST is delivered fully-formed, it makes sense to have Step 1 that attaches to each node its `GToken´ and `GTag`, and then Step 2 that builds a `GTree`.
There are three major types of `MCFile`, corresponding to how we process the file content: - "XML" - - (§1) Use stdlib `encoding/xml` to get `[]XU.XToken` - - (§1) Convert `[]XU.XToken` to `[]gparse.GToken` - - (§2) Build `GTree` - "MKDN" - - (§1) Use `yuin/goldmark` to get tree of `yuin/goldmark/ast/Node` - - (§1) From each Node make a `MkdnToken` (in a list?) incl. `GToken` and `GTag` - - (§2) Build `GTree` - "HTML" - - (§1) Use `golang.org/x/net/html` to get a tree of `html.Node` - - (§1) From each Node make a `HtmlToken` (in a list?) incl. `GToken` and `GTag` - - (§2) Build `GTree`
In general, all go files in this protocol stack should be organised as: <br/> - struct definition() - constructors (named `New*`) - printf stuff (Raw(), Echo(), String())
Some characteristic methods: - Raw() returns the original string passed from the golang XML parser (with whitespace trimmed) - Echo() returns a string of the item in normalised form, altho be aware that the presence of terminating newlines is not treated uniformly - String() returns a string suitable for runtime nonitoring and debugging
NOTE The use of shorthand in variable names: Doc, Elm, Att.
NOTE We use `godoc2md`, so we can use Markdown in these code comments.
Index ¶
- Variables
- func AddInXName(ElmT StringTally, AttT StringTally, gT *gtoken.GToken)
- func DumpGElm(p AST.Node) string
- func KidsAsSlice(p AST.Node) []AST.Node
- func ListKids(p AST.Node) string
- func NormalizeTextLeaves(rootNode AST.Node)
- type Contentity
- func (p *Contentity) DoBlockList() *Contentity
- func (p *Contentity) DoEntitiesList() error
- func (p *Contentity) DoGLinks() *Contentity
- func (p *Contentity) DoTableOfContents() *Contentity
- func (p *Contentity) DoValidation(pXCF *XU.XmlCatalogFile) (dtdS string, docS string, errS string)
- func (p *Contentity) ExecuteStages() *Contentity
- func (p *Contentity) GatherLinks() error
- func (p *Contentity) GatherXmlGLinks() *Contentity
- func (p *Contentity) IsDir() bool
- func (p *Contentity) IsDirlike() bool
- func (p *Contentity) L(level LL, format string, a ...interface{})
- func (p *Contentity) LogPrefix(mid string) string
- func (p *Contentity) NewEntitiesList() (gEnts map[string]*gparse.GEnt, err error)
- func (p *Contentity) ProcessEntities_() error
- func (p *Contentity) RefineDirectives() error
- func (p Contentity) String() string
- func (p *Contentity) SubstituteEntities() error
- func (p *Contentity) TallyTags()
- func (p *Contentity) WrapError(s string, e error)
- type ContentityEngine
- type ContentityError
- type ContentityFS
- func (p *ContentityFS) AsSlice() []*Contentity
- func (p *ContentityFS) DirCount() int
- func (p *ContentityFS) DoForEvery(stgprocsr ContentityStage)
- func (p *ContentityFS) FileCount() int
- func (p *ContentityFS) ItemCount() int
- func (p *ContentityFS) RootAbsPath() string
- func (p *ContentityFS) RootContentity() *RootContentity
- func (p *ContentityFS) Size() int
- type ContentityStage
- type Flags
- type GLink
- type GLinks
- type LL
- type LinkInfo
- type LinkInfos
- type LogInfo
- type NodeStringser
- type RootContentity
- type StringTally
Constants ¶
This section is empty.
Variables ¶
var GlobalAttCount int
var GlobalTagCount int
var LwDitaAttsForGLinks = []string{
"name",
"href",
"id",
"idref",
"idrefs",
"conref",
"data-conref",
"keys",
"data-keys",
"keyref",
"data-keyref",
}
Functions ¶
func AddInXName ¶
func AddInXName(ElmT StringTally, AttT StringTally, gT *gtoken.GToken)
func NormalizeTextLeaves ¶
Types ¶
type Contentity ¶
type Contentity struct { // Nord provides hierarchical structure, only. ON.Nord // ContentityRow includes all fields what get persisted // to the DB. It contains the field Raw (deeply embedded), // and also an FSItem that contains an Errer. m5db.ContentityRow // LogInfo is (the index of the Contentity in // the larger slice) + (the processing stage ID) LogInfo // ParserResults is parseutils.ParserResults_ffs // (ffs = file format -specific = "html" or "mkdn" but not // "xml" cos Go's XML parser does not produce a tree structure) ParserResults interface{} GTokens []*gtoken.GToken GTags []*gtree.GTag *gtree.GTree // maybe not need GRootTag or RootOfASTptr GTknsWriter, GTreeWriter, GEchoWriter io.Writer GLinks // GEnts is "ENTITY"" directives (both with "%" and without). GEnts map[string]*gparse.GEnt // DElms is "ELEMENT" directives. DElms map[string]*gtree.GTag TagTally StringTally AttTally StringTally }
Contentity is awesome. It includes a ContentityRow, which includes an FSItem, which includes an Errer. .
func NewContentity ¶
func NewContentity(aPath string) *Contentity
NewContentity returns a Contentity Nord (i.e. a node with content a and ordered children) that can NOT be the root of a Contentity tree. If there is an error, it is returned in the embedded Errer.
It should accept either an absolute or a relative filepath, altho relative is preferred, for various reasons, mainly because of the preferences of the path and filepath stdlibs.
TODO: Maybe it needs two boolean arguments:
- One to say whether to be strict about security (using os.Root and Valid/Local, and
- One to say whether to follow symlinks.
These two flags might have some interesting interactions. Since this func could (but does not) use os.Root, these can be left as calling options, rather than implementing higher security using funcs io/fs.ValidPath and path/filepath.IsLocal.
We want everything to be in a nice tree of Nords, and it means that we have to create Contenties for directories too (where `Raw_type == SU.Raw_type_DIRLIKE`), so we have to handle that case too. .
func (*Contentity) DoBlockList ¶
func (p *Contentity) DoBlockList() *Contentity
DoBlockList makes a list of all the nodes that are blocks, so that they cn be traversed for rendering, and targeted for references. .
func (*Contentity) DoEntitiesList ¶
func (p *Contentity) DoEntitiesList() error
DoEntitiesList collects all entity definitions. -n Note that each Token has been normalized. -n- rtType:ENTITY string1:foo string2:"FOO" entityIsParameter:false -n- rtType:ENTITY string1:bar string2:"BAR" entityIsParameter:true
func (*Contentity) DoTableOfContents ¶
func (p *Contentity) DoTableOfContents() *Contentity
DoTableOfContents makes a ToC. .
func (*Contentity) DoValidation ¶
func (p *Contentity) DoValidation(pXCF *XU.XmlCatalogFile) (dtdS string, docS string, errS string)
DoValidation TODO If no DOCTYPE, make a guess based on Filext but it can't be fatal.
func (*Contentity) ExecuteStages ¶
func (p *Contentity) ExecuteStages() *Contentity
ExecuteStages processes a Contentity to completion in an isolated thread, and can eaily be converted to run as a goroutine. Summary:
- st0_Init()
- st1_Read()
- st2_Tree()
- st3_Refs()
- st4_Done() (not currently called, but will work on all input files at once !)
An interesting question is, how can we indicate an error and terminate a thread prematurely ? The method currently chosen is to use interface github.com/fbaube/miscutils/Errer. This has to be checked for at the start of a func. But then we can chain functions by writing them left-to-right. Winning!
(If functions accept and return a ptr+error pair then they chain right-to-left, which is a big fail for readability.)
We could also pass in a `Context` and use its cancellation capability. Yet another way might be simply to `panic`, and so this function already has code to catch panics. .
func (*Contentity) GatherLinks ¶
func (p *Contentity) GatherLinks() error
GatherLinks is: @conref to reuse block-level content, @keyref to reuse phrase-level content. TODO Each type of link (i.e. elm/att where it occurs) has to be categorised. TODO Each format of link target has to be categorised. Cross ref : <xref> : <a href> : [link](/URI "title") Key def : <keydef> : <div data-class="keydef"> : <div data- class="keydef"> in HDITA syntax Map : <map> : <nav> : See Example of an MDITA map (20) Topic ref : <topicref> : <a href> inside a <li> : [link](/URI "title") inside a list item TODO Stuff to get: XDITA map - topicref @href (w @format) - task @id HDITA - article @id - span @data-keyref - p @data-conref MDITA - has YAML "id" - uses <p @data-conref> - uses <span @data-keyref> - uses MD [link_text](link_target.dita) - uses  XDITA - topic @id - ph @keyref - image @href - p @id - video/source @value - section @id - p @conref
func (*Contentity) GatherXmlGLinks ¶
func (p *Contentity) GatherXmlGLinks() *Contentity
GatherXmlGLinks is: XmlItems is (DOCS) IDs & IDREFs, (DTDs) Elm defs (incl. Att defs) & Ent defs *xmlfile.XmlItems // *IDinfo
func (p *MCFile) GatherXmlGLinks() *MCFile {
func (*Contentity) IsDir ¶
func (p *Contentity) IsDir() bool
func (*Contentity) IsDirlike ¶
func (p *Contentity) IsDirlike() bool
func (*Contentity) L ¶
func (p *Contentity) L(level LL, format string, a ...interface{})
func (*Contentity) LogPrefix ¶
func (p *Contentity) LogPrefix(mid string) string
func (*Contentity) NewEntitiesList ¶
func (p *Contentity) NewEntitiesList() (gEnts map[string]*gparse.GEnt, err error)
NewEntitiesList collects all entity definitions. -n Note that each Token is normalized. -n- rtType:ENTITY string1:foo string2:"FOO" entityIsParameter:false -n- rtType:ENTITY string1:bar string2:"BAR" entityIsParameter:true
CALLED BY ProcessEntities only//
func (*Contentity) ProcessEntities_ ¶
func (p *Contentity) ProcessEntities_() error
func (*Contentity) RefineDirectives ¶
func (p *Contentity) RefineDirectives() error
RefineDirectives scans to patch Directives with correct keyword.
func (Contentity) String ¶
func (p Contentity) String() string
String is developer output. Hafta dump: FU.InputFile, FU.OutputFiles, GTree, GRefs, *XmlFileMeta, *XmlItems, *DitaInfo
func (*Contentity) SubstituteEntities ¶
func (p *Contentity) SubstituteEntities() error
SubstituteEntities does replacement in Entities for simple (single-token) entity references, i.e. that begin with "%" or "&".
func (*Contentity) TallyTags ¶
func (p *Contentity) TallyTags()
func (*Contentity) WrapError ¶
func (p *Contentity) WrapError(s string, e error)
type ContentityEngine ¶
type ContentityEngine struct {
// contains filtered or unexported fields
}
ContentityEngine tracks the (oops, global) state of a ContentityFS tree being assembled, for example when a directory is specified for recursive analysis.
FIXME: ID assignment should be offloaded to the DB ? .
var CntyEng *ContentityEngine = new(ContentityEngine)
CntyEng is a package global, which is dodgy and not re-entrant. The solution probably involves currying.
NOTE: Is the call to new(..) unnecessary? This variable should NOT be reinitialized for every new ContentityFS.
type ContentityError ¶
type ContentityError struct { PE fs.PathError *Contentity }
ContentityError is Contentity + SrcLoc (in source code) + PathError struct { Op, Path string; Err error }
Maybe use the format pkg.filename.methodname.Lnn ¶
In code where package `mcfile` is not available, try a fileutils.PathPropsError
func NewContentityError ¶
func NewContentityError(ermsg string, op string, cty *Contentity) ContentityError
func WrapAsContentityError ¶
func WrapAsContentityError(e error, op string, cty *Contentity) ContentityError
func (ContentityError) Error ¶
func (ce ContentityError) Error() string
func (*ContentityError) String ¶
func (ce *ContentityError) String() string
type ContentityFS ¶
type ContentityFS struct { // FS will be set from func [os.DirFS] fs.FS // contains filtered or unexported fields }
ContentityFS is an instance of an fs.FS where every node is an mcfile.Contentity.
Note that directories ARE included in the tree, because the instances of [orderednodes.Nord] in each Contentity must properly interconnect in forming a complete tree.
Note that the file system is stored as a tree AND as a slice AND as a map. If any of these is modified without also modifying the others to match, there WILL be problems. For that reason, [asSlice] and [asMapOfAbsFP] are unexported instance variables that are accessible only via getters.
It ain't bulletproof tho. In any case, users of a ContentityFS should feel free to use the functions on the embedded [Nord] ordered nodes. .
func NewContentityFS ¶
func NewContentityFS(aPath string, okayFilexts []string) (*ContentityFS, error)
NewContentityFS proceeds as follows:
- initialize
- create an os.DirFS
- FIXME: an os.Root
- walk the DirFS, creating Contentities and appending them to a slice
- process the list to identify and make parent/child links
The path argument should probably be an absolute filepath, because a relative filepath might cause problems. Altho this is the opposite of the advice for lower-level items.
It uses the global [CntyFS], which precludes re-entrancy and concurrency.
Note that when we use os.DirFS, it appears to make no difference whether path
- is relative or absolute
- ends with a trailing slash or not
- is a directory or a symlink to a directory
The only error returns for this func are:
- a bad path, rejected by func FU.NewFilepaths
- the path is not a directory (altho it can be a symlnk to a directory ?)
- TBD: WHat happens of os.Root barfs on something ?
ContentityFS does not embed Errer and cannot itself return an error. FIXME: change this ?
TODO: Maybe it needs two boolean arguments:
- One to say whether to be strict about security (using os.Root and Valid/Local, and
- One to say whether to follow symlinks.
These two flags might have some interesting interactions. OTOH since this func can (and does?) use os.Root, it can easily (and should probably) also default to higher security using funcs io/fs.ValidPath and path/filepath.IsLocal.
Accumulated NewContentity errors are counted in the field CotentityFS.nErrors .
func (*ContentityFS) AsSlice ¶
func (p *ContentityFS) AsSlice() []*Contentity
func (*ContentityFS) DirCount ¶
func (p *ContentityFS) DirCount() int
func (*ContentityFS) DoForEvery ¶
func (p *ContentityFS) DoForEvery(stgprocsr ContentityStage)
func (*ContentityFS) FileCount ¶
func (p *ContentityFS) FileCount() int
func (*ContentityFS) ItemCount ¶
func (p *ContentityFS) ItemCount() int
func (*ContentityFS) RootAbsPath ¶
func (p *ContentityFS) RootAbsPath() string
func (*ContentityFS) RootContentity ¶
func (p *ContentityFS) RootContentity() *RootContentity
func (*ContentityFS) Size ¶
func (p *ContentityFS) Size() int
type ContentityStage ¶
type ContentityStage func(*Contentity) *Contentity
type GLink ¶
type GLink struct { // IsRefnc - else is Refnt (Referents are much more numerous) IsRefnc bool // IsExtl - else is Intl (which are more numerous) IsExtl bool // AddressMode is "http", "key", "idref", "uri" AddressMode string // Att is the XML attribute - id, idref, href, xref, keyref, etc. Att string // Tag is the tag that has this link-related attribute of interest Tag string // Link_raw as redd in during parsing Link_raw string // RelFP can be a URI or the resolution of a keyref. // "" if target is in same file; NOTE This is relative to the // sourcing file, NOT to the current working directory during parsing! RelFP string // AbsFP can be a URI or the resolution of a keyref. // "" if target is in same file AbsFP FU.AbsFilePath // TopicID iff present (but isn't it mandatory ?) TopicID string // FragID is peeled off from Raw FragID string // Resolved is used to narrow in on difficult cases Resolved bool // LinkedFrom is the GTag where the GLink is defined LinkedFrom *gtree.GTag // Original can be nil: it is the tag where the GLink is resolved to, // i.e. the REFERENT, and is quite possibly in another file, which we // hope we also have available in memory. Original *gtree.GTag }
GLink summarizes a link (or key) (or reference) found in markup content. It is either URI-based (`href conref id`) or key-based (`key keyref`). It applies to all LwDITA formats, but not all fields apply to all LwDITA formats.
type GLinks ¶
type GLinks struct { // OwnerP points back to the owning struct, so that // `GLink`s can be processed easily as simple data structures. OwnerP interface{} // KeyRefncs are outgoing key-based links/references KeyRefncs []*GLink // (Extl|Intl)KeyReferences // KeyRefnts are unique key-based definitions that are possible // referents (resolution targets) of same or other files' [KeyRefncs] KeyRefnts []*GLink // (Extl|Intl)KeyDefs // UriRefncs are outgoing URI-based links/references UriRefncs []*GLink // (Extl|Intl)UriReferences // UriRefnts are unique URI-based definitions that are possible // referents(resolution targets) of same or other files' [UriRefncs] UriRefnts []*GLink // (Extl|Intl)UriDefs }
GLinks is used for (1) intra-file ref resolution, (2) inter-file ptr resolution, (3) ToC generation.
type LinkInfos ¶
type LinkInfos struct { Conrefs []LinkInfo Keyrefs []LinkInfo Datarefs []LinkInfo // contains filtered or unexported fields }
LinkInfos is: @conref to reuse block-level content, @keyref to reuse phrase-level content. TODO Each type of link (i.e. elm/att where it occurs) has to be categorised. TODO Each format of link target has to be categorised. Cross ref : <xref> : <a href> : [link](/URI "title") Key def : <keydef> : <div data-class="keydef"> : <div data- class="keydef"> in HDITA syntax Map : <map> : <nav> : See Example of an MDITA map (20) Topic ref : <topicref> : <a href> inside a <li> : [link](/URI "title") inside a list item TODO Stuff to get: XDITA map - topicref @href (w @format) - task @id HDITA - article @id - span @data-keyref - p @data-conref MDITA - has YAML "id" - uses <p @data-conref> - uses <span @data-keyref> - uses MD [link_text](link_target.dita) - uses  XDITA - topic @id - ph @keyref - image @href - p @id - video/source @value - section @id - p @conref
In GFile: LinkInfos:
type LogInfo ¶
LogInfo exists mainly to provide a grep'able string: for example "(01:4a)", where 01 is the index of the Contentity and 4a is the processing stage. This is obv a candidate for replacement by stdlib's slog.
The io.Writer field W exists outside of the github.com/fbaube/mlog logging subsystem and should only be used if `mlog` is not. .
type NodeStringser ¶
type RootContentity ¶
type RootContentity Contentity
RootContentity makes assignments to/from root node explicit.
func NewRootContentity ¶
func NewRootContentity(aRootPath string) *RootContentity
NewRootContentity returns a RootContentity Nord (i.e. node with ordered children) that can be the root of a new Contentity tree. It requires that argument aRootPath is an absolute filepath and is a directory. .
type StringTally ¶
var GlobalAttTally StringTally
var GlobalTagTally StringTally
func (StringTally) StringSortedValues ¶
func (st StringTally) StringSortedValues() string
Source Files
¶
- contentity.go
- contentity_new.go
- contentity_newroot.go
- contentityengine.go
- contentityerror.go
- contentityfs.go
- contentityfs_mapfs.go
- contentityfs_new.go
- doc.go
- doc_wfn.go
- getglinks-mkdn.go
- getglinks-xml.go
- glink.go
- handlewalkerrarg.go
- log.go
- mkdn-textleaves.go
- nodestringser.go
- pathexclusions.go
- seterror.go
- st-exec.go
- st0-init.go
- st1-read.go
- st2-tree.go
- st3-refs.go
- st4-done.go
- tallytags.go
- utils-mkdn.go
- validation.go
- xmldoentities.go
- xmlprocentities.go
- xmlprocmeta.go