README ¶
mdlint
mdlint
is a Markdown linter. It is designed to enforce specific rules about
how Markdown is to be written in the Fuchsia Source Tree. This linter is
designed to parse Hoedown syntax, as used
on the fuchsia.dev site.
Using mdlint
Configure, and build
fx set core.x64 --with //tools/mdlint:host # or similar
fx build
Example invocation running specific rules over //docs
, and reporting all
findings:
fx mdlint --root-dir docs \
--enable no-extra-space-on-right \
--enable casing-of-anchors \
--enable bad-lists \
--enable verify-internal-links
Example invocation running all rules over //docs
, and only reporting findings
within Markdown documents whose filenames match docs/contribute/governance
:
fx mdlint --root-dir docs \
--enable all \
--filter-filenames docs/contribute/governance
Testing
Configure
fx set core.x64 --with //tools/mdlint:tests # or similar
Then test
fx test mdlint_core_tests
fx test mdlint_rules_tests
Implementation
The linter parses Markdown files successively, typically all files under a root directory.
Each Markdown file is read as a stream of UTF-8 characters (runes), which is then tokenized into a stream of tokens. We recognize specific patterns from this token stream, giving rise to a stream of patterns. This layered processing is similar to how streaming XML parsers are structured, and offers hook points for linting rules to operate at various levels of abstraction.
Tokenization
Because Markdown attaches important meaning to whitespace characters (e.g. leading space to form a list element), and certain constructs' meaning depend on their context (e.g. links, or section headers), the tokenization differs slightly from what is typically done for more standard programming languages.
Tokenization segments streams of runes into meaningful chunks, named tokens.
All whitespace runes are considered tokens, and are preserved in the token
stream. For instance, the text Hello, World!
would consist of three tokens: a
text token (Hello,
), a whitespace token (
, and lastly followed by a text
token (World!
).
Certain tokens are classified and tokenized differently depending on their
preceding context. Consider for instance a sentence (with a parenthesis)
which
is simply text tokens separated by whitespace tokens, as opposed to a [sentence](with-a-link)
where instead need to identify both the link
(sentence
) and it's corresponding URL (with-a-link
). Other similar examples
are headings, which are denoted by a series of pound runes (#
) at the start of
a line, or heading anchors {#like-so}
, which may only appear on a heading
line.
Recognition
Once a Markdown document has been tokenized, the stream of token is then pattern
matched and recognized into a stream of patterns. As an example, depending on
placement, the text [Example]
could be a link's text, a link's cross
reference, both a link's test and its cross reference, or the start of a cross
reference definition.
Implementation wise, the recognition work is done in the recognizer
which
bridges the LintRuleOverTokens
rule to a LintRuleOverPatterns
rule.
Rules
There are two sets of rules supported, rules over tokens, and rules over patterns. Both of these have common behavior which we describe first.
Common behavior
All rules are invoked:
- On start, i.e. when the linter starts.
- On document start, i.e. when the linter starts to parse a new document.
- On document end, i.e. when the linter completes the parsing of a new document.
- On end, i.e. when the linter completes.
Over tokens
Rules over tokens are additionally invoked after a document starts to parse, and before a document completes:
- On each token, i.e. as the name suggests.
Over patterns
Rules over patterns are additionally invoked after a document starts to parse, and before a document completes, for every pattern encountered. A non-exhaustive list includes:
- When a link using a cross reference is used.
- When a link using a URL is used.
- On the definition of a cross reference.
Defining a new rule
Each rule should be defined in its own file named example_rule.go
.
Rules should include a description, which by convention is placed in
the test file. The convention is to follow the pattern:
package rules
import (
"go.fuchsia.dev/fuchsia/tools/mdlint/core"
)
func init() {
// or core.RegisterLintRuleOverPatterns(...)
core.RegisterLintRuleOverTokens(exampleRuleName, newExampleRule)
}
const exampleRuleName = "example-rule"
type exampleRule struct {
...
}
var _ core.LintRuleOverTokens = (*exampleRule)(nil) // or core.LintRuleOverPatterns
func newExampleRule(reporter core.Reporter) core.LintRuleOverTokens {
return &exampleRule{ ... }
}
// followed by the implementation
Testing a rule
Rules should be tested using sample Markdown documents, with the help of the provided testing utilities:
// Description of the rule, with details of the checks provided.
func TestExampleRule_firstCase(t *testing.T) {
ruleTestCase{
files: map[string]string{
"first.md": `Sample Markdown document
Use a «marker» to denote expected warnings.
You can place markers on whitespace, for instance« »
denotes an expected warning on a non-trimmed line.`,
"second.md": `Another Markdown document here.`,
},
// or runOverPatterns
}.runOverTokens(t, newExampleRule)
}
In multi-files tests, we rely non the non-deterministic iteration order of maps
to ensure that rules do not rely on a specific file order for their correctness.
Consider running new tests multiple times using the go test
flag count
to
verify the robustness of your rule.
Documentation ¶
There is no documentation for this package.