chroma

package module
v2.0.0-alpha3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 1, 2022 License: MIT Imports: 19 Imported by: 138

README

Chroma — A general purpose syntax highlighter in pure Go

Golang Documentation CI Slack chat

NOTE: As Chroma has just been released, its API is still in flux. That said, the high-level interface should not change significantly.

Chroma takes source code and other structured text and converts it into syntax highlighted HTML, ANSI-coloured text, etc.

Chroma is based heavily on Pygments, and includes translators for Pygments lexers and styles.

Table of Contents

  1. Table of Contents
  2. Supported languages
  3. Try it
  4. Using the library
    1. Quick start
    2. Identifying the language
    3. Formatting the output
    4. The HTML formatter
  5. More detail
    1. Lexers
    2. Formatters
    3. Styles
  6. Command-line interface
  7. What's missing compared to Pygments?

Supported languages

Prefix Language
A ABAP, ABNF, ActionScript, ActionScript 3, Ada, Angular2, ANTLR, ApacheConf, APL, AppleScript, Arduino, Awk
B Ballerina, Base Makefile, Bash, Batchfile, BibTeX, Bicep, BlitzBasic, BNF, Brainfuck
C C, C#, C++, Caddyfile, Caddyfile Directives, Cap'n Proto, Cassandra CQL, Ceylon, CFEngine3, cfstatement, ChaiScript, Cheetah, Clojure, CMake, COBOL, CoffeeScript, Common Lisp, Coq, Crystal, CSS, Cython
D D, Dart, Diff, Django/Jinja, Docker, DTD, Dylan
E EBNF, Elixir, Elm, EmacsLisp, Erlang
F Factor, Fish, Forth, Fortran, FSharp
G GAS, GDScript, Genshi, Genshi HTML, Genshi Text, Gherkin, GLSL, Gnuplot, Go, Go HTML Template, Go Text Template, GraphQL, Groff, Groovy
H Handlebars, Haskell, Haxe, HCL, Hexdump, HLB, HTML, HTTP, Hy
I Idris, Igor, INI, Io
J J, Java, JavaScript, JSON, Julia, Jungle
K Kotlin
L Lighttpd configuration file, LLVM, Lua
M Mako, markdown, Mason, Mathematica, Matlab, MiniZinc, MLIR, Modula-2, MonkeyC, MorrowindScript, Myghty, MySQL
N NASM, Newspeak, Nginx configuration file, Nim, Nix
O Objective-C, OCaml, Octave, OnesEnterprise, OpenEdge ABL, OpenSCAD, Org Mode
P PacmanConf, Perl, PHP, PHTML, Pig, PkgConfig, PL/pgSQL, plaintext, Pony, PostgreSQL SQL dialect, PostScript, POVRay, PowerShell, Prolog, PromQL, Protocol Buffer, Puppet, Python 2, Python
Q QBasic
R R, Racket, Ragel, Raku, react, ReasonML, reg, reStructuredText, Rexx, Ruby, Rust
S SAS, Sass, Scala, Scheme, Scilab, SCSS, Smalltalk, Smarty, Snobol, Solidity, SPARQL, SQL, SquidConf, Standard ML, Stylus, Svelte, Swift, SYSTEMD, systemverilog
T TableGen, TASM, Tcl, Tcsh, Termcap, Terminfo, Terraform, TeX, Thrift, TOML, TradingView, Transact-SQL, Turing, Turtle, Twig, TypeScript, TypoScript, TypoScriptCssData, TypoScriptHtmlData
V VB.net, verilog, VHDL, VimL, vue
W WDTE
X XML, Xorg
Y YAML, YANG
Z Zig

I will attempt to keep this section up to date, but an authoritative list can be displayed with chroma --list.

Try it

Try out various languages and styles on the Chroma Playground.

Using the library

Chroma, like Pygments, has the concepts of lexers, formatters and styles.

Lexers convert source text into a stream of tokens, styles specify how token types are mapped to colours, and formatters convert tokens and styles into formatted output.

A package exists for each of these, containing a global Registry variable with all of the registered implementations. There are also helper functions for using the registry in each package, such as looking up lexers by name or matching filenames, etc.

In all cases, if a lexer, formatter or style can not be determined, nil will be returned. In this situation you may want to default to the Fallback value in each respective package, which provides sane defaults.

Quick start

A convenience function exists that can be used to simply format some source text, without any effort:

err := quick.Highlight(os.Stdout, someSourceCode, "go", "html", "monokai")

Identifying the language

To highlight code, you'll first have to identify what language the code is written in. There are three primary ways to do that:

  1. Detect the language from its filename.

    lexer := lexers.Match("foo.go")
    
  2. Explicitly specify the language by its Chroma syntax ID (a full list is available from lexers.Names()).

    lexer := lexers.Get("go")
    
  3. Detect the language from its content.

    lexer := lexers.Analyse("package main\n\nfunc main()\n{\n}\n")
    

In all cases, nil will be returned if the language can not be identified.

if lexer == nil {
  lexer = lexers.Fallback
}

At this point, it should be noted that some lexers can be extremely chatty. To mitigate this, you can use the coalescing lexer to coalesce runs of identical token types into a single token:

lexer = chroma.Coalesce(lexer)

Formatting the output

Once a language is identified you will need to pick a formatter and a style (theme).

style := styles.Get("swapoff")
if style == nil {
  style = styles.Fallback
}
formatter := formatters.Get("html")
if formatter == nil {
  formatter = formatters.Fallback
}

Then obtain an iterator over the tokens:

contents, err := ioutil.ReadAll(r)
iterator, err := lexer.Tokenise(nil, string(contents))

And finally, format the tokens from the iterator:

err := formatter.Format(w, style, iterator)

The HTML formatter

By default the html registered formatter generates standalone HTML with embedded CSS. More flexibility is available through the formatters/html package.

Firstly, the output generated by the formatter can be customised with the following constructor options:

  • Standalone() - generate standalone HTML with embedded CSS.
  • WithClasses() - use classes rather than inlined style attributes.
  • ClassPrefix(prefix) - prefix each generated CSS class.
  • TabWidth(width) - Set the rendered tab width, in characters.
  • WithLineNumbers() - Render line numbers (style with LineNumbers).
  • LinkableLineNumbers() - Make the line numbers linkable and be a link to themselves.
  • HighlightLines(ranges) - Highlight lines in these ranges (style with LineHighlight).
  • LineNumbersInTable() - Use a table for formatting line numbers and code, rather than spans.

If WithClasses() is used, the corresponding CSS can be obtained from the formatter with:

formatter := html.New(html.WithClasses(true))
err := formatter.WriteCSS(w, style)

More detail

Lexers

See the Pygments documentation for details on implementing lexers. Most concepts apply directly to Chroma, but see existing lexer implementations for real examples.

In many cases lexers can be automatically converted directly from Pygments by using the included Python 3 script pygments2chroma.py. I use something like the following:

python3 _tools/pygments2chroma.py \
  pygments.lexers.jvm.KotlinLexer \
  > lexers/k/kotlin.go \
  && gofmt -s -w lexers/k/kotlin.go

See notes in pygments-lexers.txt for a list of lexers, and notes on some of the issues importing them.

Formatters

Chroma supports HTML output, as well as terminal output in 8 colour, 256 colour, and true-colour.

A noop formatter is included that outputs the token text only, and a tokens formatter outputs raw tokens. The latter is useful for debugging lexers.

Styles

Chroma styles use the same syntax as Pygments.

All Pygments styles have been converted to Chroma using the _tools/style.py script.

When you work with one of Chroma's styles, know that the chroma.Background token type provides the default style for tokens. It does so by defining a foreground color and background color.

For example, this gives each token name not defined in the style a default color of #f8f8f8 and uses #000000 for the highlighted code block's background:

chroma.Background: "#f8f8f2 bg:#000000",

Also, token types in a style file are hierarchical. For instance, when CommentSpecial is not defined, Chroma uses the token style from Comment. So when several comment tokens use the same color, you'll only need to define Comment and override the one that has a different color.

For a quick overview of the available styles and how they look, check out the Chroma Style Gallery.

Command-line interface

A command-line interface to Chroma is included.

Binaries are available to install from the releases page.

The CLI can be used as a preprocessor to colorise output of less(1), see documentation for the LESSOPEN environment variable.

The --fail flag can be used to suppress output and return with exit status 1 to facilitate falling back to some other preprocessor in case chroma does not resolve a specific lexer to use for the given file. For example:

export LESSOPEN='| p() { chroma --fail "$1" || cat "$1"; }; p "%s"'

Replace cat with your favourite fallback preprocessor.

When invoked as .lessfilter, the --fail flag is automatically turned on under the hood for easy integration with lesspipe shipping with Debian and derivatives; for that setup the chroma executable can be just symlinked to ~/.lessfilter.

What's missing compared to Pygments?

  • Quite a few lexers, for various reasons (pull-requests welcome):
    • Pygments lexers for complex languages often include custom code to handle certain aspects, such as Raku's ability to nest code inside regular expressions. These require time and effort to convert.
    • I mostly only converted languages I had heard of, to reduce the porting cost.
  • Some more esoteric features of Pygments are omitted for simplicity.
  • Though the Chroma API supports content detection, very few languages support them. I have plans to implement a statistical analyser at some point, but not enough time.

Documentation

Overview

Package chroma takes source code and other structured text and converts it into syntax highlighted HTML, ANSI- coloured text, etc.

Chroma is based heavily on Pygments, and includes translators for Pygments lexers and styles.

For more information, go here: https://github.com/alecthomas/chroma

Index

Constants

View Source
const (
	Whitespace = TextWhitespace

	Date = LiteralDate

	String          = LiteralString
	StringAffix     = LiteralStringAffix
	StringBacktick  = LiteralStringBacktick
	StringChar      = LiteralStringChar
	StringDelimiter = LiteralStringDelimiter
	StringDoc       = LiteralStringDoc
	StringDouble    = LiteralStringDouble
	StringEscape    = LiteralStringEscape
	StringHeredoc   = LiteralStringHeredoc
	StringInterpol  = LiteralStringInterpol
	StringOther     = LiteralStringOther
	StringRegex     = LiteralStringRegex
	StringSingle    = LiteralStringSingle
	StringSymbol    = LiteralStringSymbol

	Number            = LiteralNumber
	NumberBin         = LiteralNumberBin
	NumberFloat       = LiteralNumberFloat
	NumberHex         = LiteralNumberHex
	NumberInteger     = LiteralNumberInteger
	NumberIntegerLong = LiteralNumberIntegerLong
	NumberOct         = LiteralNumberOct
)

Aliases.

Variables

View Source
var ANSI2RGB = map[string]string{
	"#ansiblack":     "000000",
	"#ansidarkred":   "7f0000",
	"#ansidarkgreen": "007f00",
	"#ansibrown":     "7f7fe0",
	"#ansidarkblue":  "00007f",
	"#ansipurple":    "7f007f",
	"#ansiteal":      "007f7f",
	"#ansilightgray": "e5e5e5",

	"#ansidarkgray":  "555555",
	"#ansired":       "ff0000",
	"#ansigreen":     "00ff00",
	"#ansiyellow":    "ffff00",
	"#ansiblue":      "0000ff",
	"#ansifuchsia":   "ff00ff",
	"#ansiturquoise": "00ffff",
	"#ansiwhite":     "ffffff",

	"#black":     "000000",
	"#darkred":   "7f0000",
	"#darkgreen": "007f00",
	"#brown":     "7f7fe0",
	"#darkblue":  "00007f",
	"#purple":    "7f007f",
	"#teal":      "007f7f",
	"#lightgray": "e5e5e5",

	"#darkgray":  "555555",
	"#red":       "ff0000",
	"#green":     "00ff00",
	"#yellow":    "ffff00",
	"#blue":      "0000ff",
	"#fuchsia":   "ff00ff",
	"#turquoise": "00ffff",
	"#white":     "ffffff",
}

ANSI2RGB maps ANSI colour names, as supported by Chroma, to hex RGB values.

View Source
var (
	// ErrNotSerialisable is returned if a lexer contains Rules that cannot be serialised.
	ErrNotSerialisable = fmt.Errorf("not serialisable")
)

Serialisation of Chroma rules to XML. The format is:

<rules>
  <state name="$STATE">
    <rule [pattern="$PATTERN"]>
      [<$EMITTER ...>]
      [<$MUTATOR ...>]
    </rule>
  </state>
</rules>

eg. Include("String") would become:

<rule>
  <include state="String" />
</rule>

[null, null, {"kind": "include", "state": "String"}]

eg. Rule{`\d+`, Text, nil} would become:

<rule pattern="\\d+">
  <token type="Text"/>
</rule>

eg. Rule{`"`, String, Push("String")}

<rule pattern="\"">
  <token type="String" />
  <push state="String" />
</rule>

eg. Rule{`(\w+)(\n)`, ByGroups(Keyword, Whitespace), nil},

<rule pattern="(\\w+)(\\n)">
  <bygroups token="Keyword" token="Whitespace" />
  <push state="String" />
</rule>
View Source
var (
	StandardTypes = map[TokenType]string{
		Background:       "bg",
		PreWrapper:       "chroma",
		Line:             "line",
		LineNumbers:      "ln",
		LineNumbersTable: "lnt",
		LineHighlight:    "hl",
		LineTable:        "lntable",
		LineTableTD:      "lntd",
		CodeLine:         "cl",
		Text:             "",
		Whitespace:       "w",
		Error:            "err",
		Other:            "x",

		Keyword:            "k",
		KeywordConstant:    "kc",
		KeywordDeclaration: "kd",
		KeywordNamespace:   "kn",
		KeywordPseudo:      "kp",
		KeywordReserved:    "kr",
		KeywordType:        "kt",

		Name:                 "n",
		NameAttribute:        "na",
		NameBuiltin:          "nb",
		NameBuiltinPseudo:    "bp",
		NameClass:            "nc",
		NameConstant:         "no",
		NameDecorator:        "nd",
		NameEntity:           "ni",
		NameException:        "ne",
		NameFunction:         "nf",
		NameFunctionMagic:    "fm",
		NameProperty:         "py",
		NameLabel:            "nl",
		NameNamespace:        "nn",
		NameOther:            "nx",
		NameTag:              "nt",
		NameVariable:         "nv",
		NameVariableClass:    "vc",
		NameVariableGlobal:   "vg",
		NameVariableInstance: "vi",
		NameVariableMagic:    "vm",

		Literal:     "l",
		LiteralDate: "ld",

		String:          "s",
		StringAffix:     "sa",
		StringBacktick:  "sb",
		StringChar:      "sc",
		StringDelimiter: "dl",
		StringDoc:       "sd",
		StringDouble:    "s2",
		StringEscape:    "se",
		StringHeredoc:   "sh",
		StringInterpol:  "si",
		StringOther:     "sx",
		StringRegex:     "sr",
		StringSingle:    "s1",
		StringSymbol:    "ss",

		Number:            "m",
		NumberBin:         "mb",
		NumberFloat:       "mf",
		NumberHex:         "mh",
		NumberInteger:     "mi",
		NumberIntegerLong: "il",
		NumberOct:         "mo",

		Operator:     "o",
		OperatorWord: "ow",

		Punctuation: "p",

		Comment:            "c",
		CommentHashbang:    "ch",
		CommentMultiline:   "cm",
		CommentPreproc:     "cp",
		CommentPreprocFile: "cpf",
		CommentSingle:      "c1",
		CommentSpecial:     "cs",

		Generic:           "g",
		GenericDeleted:    "gd",
		GenericEmph:       "ge",
		GenericError:      "gr",
		GenericHeading:    "gh",
		GenericInserted:   "gi",
		GenericOutput:     "go",
		GenericPrompt:     "gp",
		GenericStrong:     "gs",
		GenericSubheading: "gu",
		GenericTraceback:  "gt",
		GenericUnderline:  "gl",
	}
)

Functions

func Marshal

func Marshal(l *RegexLexer) ([]byte, error)

Marshal a RegexLexer to XML.

func SplitTokensIntoLines

func SplitTokensIntoLines(tokens []Token) (out [][]Token)

SplitTokensIntoLines splits tokens containing newlines in two.

func Stringify

func Stringify(tokens ...Token) string

Stringify returns the raw string for a set of tokens.

func Words

func Words(prefix, suffix string, words ...string) string

Words creates a regex that matches any of the given literal words.

Types

type Analyser

type Analyser interface {
	AnalyseText(text string) float32
}

Analyser determines how appropriate this lexer is for the given text.

type Colour

type Colour int32

Colour represents an RGB colour.

func MustParseColour

func MustParseColour(colour string) Colour

MustParseColour is like ParseColour except it panics if the colour is invalid.

Will panic if colour is in an invalid format.

func NewColour

func NewColour(r, g, b uint8) Colour

NewColour creates a Colour directly from RGB values.

func ParseColour

func ParseColour(colour string) Colour

ParseColour in the forms #rgb, #rrggbb, #ansi<colour>, or #<colour>. Will return an "unset" colour if invalid.

func (Colour) Blue

func (c Colour) Blue() uint8

Blue component of colour.

func (Colour) Brighten

func (c Colour) Brighten(factor float64) Colour

Brighten returns a copy of this colour with its brightness adjusted.

If factor is negative, the colour is darkened.

Uses approach described here (http://www.pvladov.com/2012/09/make-color-lighter-or-darker.html).

func (Colour) BrightenOrDarken

func (c Colour) BrightenOrDarken(factor float64) Colour

BrightenOrDarken brightens a colour if it is < 0.5 brighteness or darkens if > 0.5 brightness.

func (Colour) Brightness

func (c Colour) Brightness() float64

Brightness of the colour (roughly) in the range 0.0 to 1.0

func (Colour) Distance

func (c Colour) Distance(e2 Colour) float64

Distance between this colour and another.

This uses the approach described here (https://www.compuphase.com/cmetric.htm). This is not as accurate as LAB, et. al. but is *vastly* simpler and sufficient for our needs.

func (Colour) GoString

func (c Colour) GoString() string

func (Colour) Green

func (c Colour) Green() uint8

Green component of colour.

func (Colour) IsSet

func (c Colour) IsSet() bool

IsSet returns true if the colour is set.

func (Colour) Red

func (c Colour) Red() uint8

Red component of colour.

func (Colour) String

func (c Colour) String() string

type Colours

type Colours []Colour

Colours is an orderable set of colours.

func (Colours) Len

func (c Colours) Len() int

func (Colours) Less

func (c Colours) Less(i, j int) bool

func (Colours) Swap

func (c Colours) Swap(i, j int)

type CompiledRule

type CompiledRule struct {
	Rule
	Regexp *regexp2.Regexp
	// contains filtered or unexported fields
}

A CompiledRule is a Rule with a pre-compiled regex.

Note that regular expressions are lazily compiled on first use of the lexer.

type CompiledRules

type CompiledRules map[string][]*CompiledRule

CompiledRules is a map of rule name to sequence of compiled rules in that rule.

type Config

type Config struct {
	// Name of the lexer.
	Name string `xml:"name,omitempty"`

	// Shortcuts for the lexer
	Aliases []string `xml:"alias,omitempty"`

	// File name globs
	Filenames []string `xml:"filename,omitempty"`

	// Secondary file name globs
	AliasFilenames []string `xml:"alias_filename,omitempty"`

	// MIME types
	MimeTypes []string `xml:"mime_type,omitempty"`

	// Regex matching is case-insensitive.
	CaseInsensitive bool `xml:"case_insensitive,omitempty"`

	// Regex matches all characters.
	DotAll bool `xml:"dot_all,omitempty"`

	// Regex does not match across lines ($ matches EOL).
	//
	// Defaults to multiline.
	NotMultiline bool `xml:"not_multiline,omitempty"`

	// Make sure that the input ends with a newline. This
	// is required for some lexers that consume input linewise.
	EnsureNL bool `xml:"ensure_nl,omitempty"`

	// Priority of lexer.
	//
	// If this is 0 it will be treated as a default of 1.
	Priority float32 `xml:"priority,omitempty"`
}

Config for a lexer.

type Emitter

type Emitter interface {
	// Emit tokens for the given regex groups.
	Emit(groups []string, state *LexerState) Iterator
}

An Emitter takes group matches and returns tokens.

func ByGroupNames

func ByGroupNames(emitters map[string]Emitter) Emitter

ByGroupNames emits a token for each named matching group in the rule's regex.

func ByGroups

func ByGroups(emitters ...Emitter) Emitter

ByGroups emits a token for each matching group in the rule's regex.

func Using

func Using(lexer string) Emitter

Using returns an Emitter that uses a given Lexer reference for parsing and emitting.

The referenced lexer must be stored in the same LexerRegistry.

func UsingByGroup

func UsingByGroup(sublexerNameGroup, codeGroup int, emitters ...Emitter) Emitter

UsingByGroup emits tokens for the matched groups in the regex using a "sublexer". Used when lexing code blocks where the name of a sublexer is contained within the block, for example on a Markdown text block or SQL language block.

The sublexer will be retrieved using sublexerGetFunc (typically internal.Get), using the captured value from the matched sublexerNameGroup.

If sublexerGetFunc returns a non-nil lexer for the captured sublexerNameGroup, then tokens for the matched codeGroup will be emitted using the retrieved lexer. Otherwise, if the sublexer is nil, then tokens will be emitted from the passed emitter.

Example:

var Markdown = internal.Register(MustNewLexer(
	&Config{
		Name:      "markdown",
		Aliases:   []string{"md", "mkd"},
		Filenames: []string{"*.md", "*.mkd", "*.markdown"},
		MimeTypes: []string{"text/x-markdown"},
	},
	Rules{
		"root": {
			{"^(```)(\\w+)(\\n)([\\w\\W]*?)(^```$)",
				UsingByGroup(
					internal.Get,
					2, 4,
					String, String, String, Text, String,
				),
				nil,
			},
		},
	},
))

See the lexers/m/markdown.go for the complete example.

Note: panic's if the number of emitters does not equal the number of matched groups in the regex.

func UsingLexer

func UsingLexer(lexer Lexer) Emitter

UsingLexer returns an Emitter that uses a given Lexer for parsing and emitting.

This Emitter is not serialisable.

func UsingSelf

func UsingSelf(stateName string) Emitter

UsingSelf is like Using, but uses the current Lexer.

type EmitterFunc

type EmitterFunc func(groups []string, state *LexerState) Iterator

EmitterFunc is a function that is an Emitter.

func (EmitterFunc) Emit

func (e EmitterFunc) Emit(groups []string, state *LexerState) Iterator

Emit tokens for groups.

type Emitters

type Emitters []Emitter

func (Emitters) MarshalXML

func (b Emitters) MarshalXML(e *xml.Encoder, start xml.StartElement) error

func (*Emitters) UnmarshalXML

func (b *Emitters) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error

type Formatter

type Formatter interface {
	// Format returns a formatting function for tokens.
	//
	// If the iterator panics, the Formatter should recover.
	Format(w io.Writer, style *Style, iterator Iterator) error
}

A Formatter for Chroma lexers.

func RecoveringFormatter

func RecoveringFormatter(formatter Formatter) Formatter

RecoveringFormatter wraps a formatter with panic recovery.

type FormatterFunc

type FormatterFunc func(w io.Writer, style *Style, iterator Iterator) error

A FormatterFunc is a Formatter implemented as a function.

Guards against iterator panics.

func (FormatterFunc) Format

func (f FormatterFunc) Format(w io.Writer, s *Style, it Iterator) (err error)

type Iterator

type Iterator func() Token

An Iterator across tokens.

EOF will be returned at the end of the Token stream.

If an error occurs within an Iterator, it may propagate this in a panic. Formatters should recover.

func Concaterator

func Concaterator(iterators ...Iterator) Iterator

Concaterator concatenates tokens from a series of iterators.

func Literator

func Literator(tokens ...Token) Iterator

Literator converts a sequence of literal Tokens into an Iterator.

func (Iterator) Tokens

func (i Iterator) Tokens() []Token

Tokens consumes all tokens from the iterator and returns them as a slice.

type Lexer

type Lexer interface {
	// Config describing the features of the Lexer.
	Config() *Config
	// Tokenise returns an Iterator over tokens in text.
	Tokenise(options *TokeniseOptions, text string) (Iterator, error)
	// SetRegistry sets the registry this Lexer is associated with.
	//
	// The registry should be used by the Lexer if it needs to look up other
	// lexers.
	SetRegistry(registry *LexerRegistry) Lexer
	// SetAnalyser sets a function the Lexer should use for scoring how
	// likely a fragment of text is to match this lexer, between 0.0 and 1.0.
	// A value of 1 indicates high confidence.
	//
	// Lexers may ignore this if they implement their own analysers.
	SetAnalyser(analyser func(text string) float32) Lexer
	// AnalyseText scores how likely a fragment of text is to match
	// this lexer, between 0.0 and 1.0. A value of 1 indicates high confidence.
	AnalyseText(text string) float32
}

A Lexer for tokenising source code.

func Coalesce

func Coalesce(lexer Lexer) Lexer

Coalesce is a Lexer interceptor that collapses runs of common types into a single token.

func DelegatingLexer

func DelegatingLexer(root Lexer, language Lexer) Lexer

DelegatingLexer combines two lexers to handle the common case of a language embedded inside another, such as PHP inside HTML or PHP inside plain text.

It takes two lexer as arguments: a root lexer and a language lexer. First everything is scanned using the language lexer, which must return "Other" for unrecognised tokens. Then all "Other" tokens are lexed using the root lexer. Finally, these two sets of tokens are merged.

The lexers from the template lexer package use this base lexer.

func RemappingLexer

func RemappingLexer(lexer Lexer, mapper func(Token) []Token) Lexer

RemappingLexer remaps a token to a set of, potentially empty, tokens.

func TypeRemappingLexer

func TypeRemappingLexer(lexer Lexer, mapping TypeMapping) Lexer

TypeRemappingLexer remaps types of tokens coming from a parent Lexer.

eg. Map "defvaralias" tokens of type NameVariable to NameFunction:

mapping := TypeMapping{
	{NameVariable, NameFunction, []string{"defvaralias"},
}
lexer = TypeRemappingLexer(lexer, mapping)

type LexerMutator

type LexerMutator interface {
	// MutateLexer can be implemented to mutate the lexer itself.
	//
	// Rules are the lexer rules, state is the state key for the rule the mutator is associated with.
	MutateLexer(rules CompiledRules, state string, rule int) error
}

A LexerMutator is an additional interface that a Mutator can implement to modify the lexer when it is compiled.

type LexerRegistry

type LexerRegistry struct {
	Lexers Lexers
	// contains filtered or unexported fields
}

LexerRegistry is a registry of Lexers.

func NewLexerRegistry

func NewLexerRegistry() *LexerRegistry

NewLexerRegistry creates a new LexerRegistry of Lexers.

func (*LexerRegistry) Analyse

func (l *LexerRegistry) Analyse(text string) Lexer

Analyse text content and return the "best" lexer..

func (*LexerRegistry) Get

func (l *LexerRegistry) Get(name string) Lexer

Get a Lexer by name, alias or file extension.

func (*LexerRegistry) Match

func (l *LexerRegistry) Match(filename string) Lexer

Match returns the first lexer matching filename.

func (*LexerRegistry) MatchMimeType

func (l *LexerRegistry) MatchMimeType(mimeType string) Lexer

MatchMimeType attempts to find a lexer for the given MIME type.

func (*LexerRegistry) Names

func (l *LexerRegistry) Names(withAliases bool) []string

Names of all lexers, optionally including aliases.

func (*LexerRegistry) Register

func (l *LexerRegistry) Register(lexer Lexer) Lexer

Register a Lexer with the LexerRegistry.

type LexerState

type LexerState struct {
	Lexer    *RegexLexer
	Registry *LexerRegistry
	Text     []rune
	Pos      int
	Rules    CompiledRules
	Stack    []string
	State    string
	Rule     int
	// Group matches.
	Groups []string
	// Named Group matches.
	NamedGroups map[string]string
	// Custum context for mutators.
	MutatorContext map[interface{}]interface{}
	// contains filtered or unexported fields
}

LexerState contains the state for a single lex.

func (*LexerState) Get

func (l *LexerState) Get(key interface{}) interface{}

Get mutator context.

func (*LexerState) Iterator

func (l *LexerState) Iterator() Token

Iterator returns the next Token from the lexer.

func (*LexerState) Set

func (l *LexerState) Set(key interface{}, value interface{})

Set mutator context.

type Lexers

type Lexers []Lexer

Lexers is a slice of lexers sortable by name.

func (Lexers) Len

func (l Lexers) Len() int

func (Lexers) Less

func (l Lexers) Less(i, j int) bool

func (Lexers) Swap

func (l Lexers) Swap(i, j int)

type Mutator

type Mutator interface {
	// Mutate the lexer state machine as it is processing.
	Mutate(state *LexerState) error
}

A Mutator modifies the behaviour of the lexer.

func Combined

func Combined(states ...string) Mutator

Combined creates a new anonymous state from the given states, and pushes that state.

func Mutators

func Mutators(modifiers ...Mutator) Mutator

Mutators applies a set of Mutators in order.

func Pop

func Pop(n int) Mutator

Pop state from the stack when rule matches.

func Push

func Push(states ...string) Mutator

Push states onto the stack.

type MutatorFunc

type MutatorFunc func(state *LexerState) error

A MutatorFunc is a Mutator that mutates the lexer state machine as it is processing.

func (MutatorFunc) Mutate

func (m MutatorFunc) Mutate(state *LexerState) error

type PrioritisedLexers

type PrioritisedLexers []Lexer

PrioritisedLexers is a slice of lexers sortable by priority.

func (PrioritisedLexers) Len

func (l PrioritisedLexers) Len() int

func (PrioritisedLexers) Less

func (l PrioritisedLexers) Less(i, j int) bool

func (PrioritisedLexers) Swap

func (l PrioritisedLexers) Swap(i, j int)

type RegexLexer

type RegexLexer struct {
	// contains filtered or unexported fields
}

RegexLexer is the default lexer implementation used in Chroma.

func MustNewLexer

func MustNewLexer(config *Config, rulesFunc func() Rules) *RegexLexer

MustNewLexer creates a new Lexer with deferred rules generation or panics.

func MustNewXMLLexer

func MustNewXMLLexer(from fs.FS, path string) *RegexLexer

MustNewXMLLexer constructs a new RegexLexer from an XML file or panics.

func NewLexer

func NewLexer(config *Config, rulesFunc func() Rules) (*RegexLexer, error)

NewLexer creates a new regex-based Lexer.

"rules" is a state machine transition map. Each key is a state. Values are sets of rules that match input, optionally modify lexer state, and output tokens.

func NewXMLLexer

func NewXMLLexer(from fs.FS, path string) (*RegexLexer, error)

NewXMLLexer creates a new RegexLexer from a serialised RegexLexer.

func Unmarshal

func Unmarshal(data []byte) (*RegexLexer, error)

Unmarshal a RegexLexer from XML.

func (*RegexLexer) AnalyseText

func (r *RegexLexer) AnalyseText(text string) float32

func (*RegexLexer) Config

func (r *RegexLexer) Config() *Config

func (*RegexLexer) MustRules

func (r *RegexLexer) MustRules() Rules

MustRules is like Rules() but will panic on error.

func (*RegexLexer) Rules

func (r *RegexLexer) Rules() (Rules, error)

Rules in the Lexer.

func (*RegexLexer) SetAnalyser

func (r *RegexLexer) SetAnalyser(analyser func(text string) float32) Lexer

SetAnalyser sets the analyser function used to perform content inspection.

func (*RegexLexer) SetConfig

func (r *RegexLexer) SetConfig(config *Config) *RegexLexer

SetConfig replaces the Config for this Lexer.

func (*RegexLexer) SetRegistry

func (r *RegexLexer) SetRegistry(registry *LexerRegistry) Lexer

SetRegistry the lexer will use to lookup other lexers if necessary.

func (*RegexLexer) String

func (r *RegexLexer) String() string

func (*RegexLexer) Tokenise

func (r *RegexLexer) Tokenise(options *TokeniseOptions, text string) (Iterator, error)

func (*RegexLexer) Trace

func (r *RegexLexer) Trace(trace bool) *RegexLexer

Trace enables debug tracing.

type Rule

type Rule struct {
	Pattern string
	Type    Emitter
	Mutator Mutator
}

A Rule is the fundamental matching unit of the Regex lexer state machine.

func Default

func Default(mutators ...Mutator) Rule

Default returns a Rule that applies a set of Mutators.

func Include

func Include(state string) Rule

Include the given state.

func (Rule) MarshalXML

func (r Rule) MarshalXML(e *xml.Encoder, _ xml.StartElement) error

func (*Rule) UnmarshalXML

func (r *Rule) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error

type Rules

type Rules map[string][]Rule

Rules maps from state to a sequence of Rules.

func (Rules) Clone

func (r Rules) Clone() Rules

Clone returns a clone of the Rules.

func (Rules) MarshalXML

func (r Rules) MarshalXML(e *xml.Encoder, _ xml.StartElement) error

func (Rules) Merge

func (r Rules) Merge(rules Rules) Rules

Merge creates a clone of "r" then merges "rules" into the clone.

func (Rules) Rename

func (r Rules) Rename(oldRule, newRule string) Rules

Rename clones rules then a rule.

func (*Rules) UnmarshalXML

func (r *Rules) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error

type SerialisableEmitter

type SerialisableEmitter interface {
	Emitter
	EmitterKind() string
}

SerialisableEmitter is an Emitter that can be serialised and deserialised to/from JSON.

type SerialisableMutator

type SerialisableMutator interface {
	Mutator
	MutatorKind() string
}

SerialisableMutator is a Mutator that can be serialised and deserialised.

type Style

type Style struct {
	Name string
	// contains filtered or unexported fields
}

A Style definition.

See http://pygments.org/docs/styles/ for details. Semantics are intended to be identical.

func MustNewStyle

func MustNewStyle(name string, entries StyleEntries) *Style

MustNewStyle creates a new style or panics.

func NewStyle

func NewStyle(name string, entries StyleEntries) (*Style, error)

NewStyle creates a new style definition.

func (*Style) Builder

func (s *Style) Builder() *StyleBuilder

Builder creates a mutable builder from this Style.

The builder can then be safely modified. This is a cheap operation.

func (*Style) Get

func (s *Style) Get(ttype TokenType) StyleEntry

Get a style entry. Will try sub-category or category if an exact match is not found, and finally return the Background.

func (*Style) Has

func (s *Style) Has(ttype TokenType) bool

Has checks if an exact style entry match exists for a token type.

This is distinct from Get() which will merge parent tokens.

func (*Style) Types

func (s *Style) Types() []TokenType

Types that are styled.

type StyleBuilder

type StyleBuilder struct {
	// contains filtered or unexported fields
}

A StyleBuilder is a mutable structure for building styles.

Once built, a Style is immutable.

func NewStyleBuilder

func NewStyleBuilder(name string) *StyleBuilder

func (*StyleBuilder) Add

func (s *StyleBuilder) Add(ttype TokenType, entry string) *StyleBuilder

Add an entry to the Style map.

See http://pygments.org/docs/styles/#style-rules for details.

func (*StyleBuilder) AddAll

func (s *StyleBuilder) AddAll(entries StyleEntries) *StyleBuilder

func (*StyleBuilder) AddEntry

func (s *StyleBuilder) AddEntry(ttype TokenType, entry StyleEntry) *StyleBuilder

func (*StyleBuilder) Build

func (s *StyleBuilder) Build() (*Style, error)

func (*StyleBuilder) Get

func (s *StyleBuilder) Get(ttype TokenType) StyleEntry

type StyleEntries

type StyleEntries map[TokenType]string

StyleEntries mapping TokenType to colour definition.

type StyleEntry

type StyleEntry struct {
	// Hex colours.
	Colour     Colour
	Background Colour
	Border     Colour

	Bold      Trilean
	Italic    Trilean
	Underline Trilean
	NoInherit bool
}

A StyleEntry in the Style map.

func ParseStyleEntry

func ParseStyleEntry(entry string) (StyleEntry, error)

ParseStyleEntry parses a Pygments style entry.

func (StyleEntry) Inherit

func (s StyleEntry) Inherit(ancestors ...StyleEntry) StyleEntry

Inherit styles from ancestors.

Ancestors should be provided from oldest to newest.

func (StyleEntry) IsZero

func (s StyleEntry) IsZero() bool

func (StyleEntry) String

func (s StyleEntry) String() string

func (StyleEntry) Sub

func (s StyleEntry) Sub(e StyleEntry) StyleEntry

Sub subtracts e from s where elements match.

type Token

type Token struct {
	Type  TokenType `json:"type"`
	Value string    `json:"value"`
}

Token output to formatter.

var EOF Token

EOF is returned by lexers at the end of input.

func Tokenise

func Tokenise(lexer Lexer, options *TokeniseOptions, text string) ([]Token, error)

Tokenise text using lexer, returning tokens as a slice.

func (*Token) Clone

func (t *Token) Clone() Token

Clone returns a clone of the Token.

func (*Token) GoString

func (t *Token) GoString() string

func (*Token) String

func (t *Token) String() string

type TokenType

type TokenType int

TokenType is the type of token to highlight.

It is also an Emitter, emitting a single token of itself

const (
	// Default background style.
	Background TokenType = -1 - iota
	// PreWrapper style.
	PreWrapper
	// Line style.
	Line
	// Line numbers in output.
	LineNumbers
	// Line numbers in output when in table.
	LineNumbersTable
	// Line higlight style.
	LineHighlight
	// Line numbers table wrapper style.
	LineTable
	// Line numbers table TD wrapper style.
	LineTableTD
	// Code line wrapper style.
	CodeLine
	// Input that could not be tokenised.
	Error
	// Other is used by the Delegate lexer to indicate which tokens should be handled by the delegate.
	Other
	// No highlighting.
	None
	// Used as an EOF marker / nil token
	EOFType TokenType = 0
)

Meta token types.

const (
	Keyword TokenType = 1000 + iota
	KeywordConstant
	KeywordDeclaration
	KeywordNamespace
	KeywordPseudo
	KeywordReserved
	KeywordType
)

Keywords.

const (
	Name TokenType = 2000 + iota
	NameAttribute
	NameBuiltin
	NameBuiltinPseudo
	NameClass
	NameConstant
	NameDecorator
	NameEntity
	NameException
	NameFunction
	NameFunctionMagic
	NameKeyword
	NameLabel
	NameNamespace
	NameOperator
	NameOther
	NamePseudo
	NameProperty
	NameTag
	NameVariable
	NameVariableAnonymous
	NameVariableClass
	NameVariableGlobal
	NameVariableInstance
	NameVariableMagic
)

Names.

const (
	Literal TokenType = 3000 + iota
	LiteralDate
	LiteralOther
)

Literals.

const (
	LiteralString TokenType = 3100 + iota
	LiteralStringAffix
	LiteralStringAtom
	LiteralStringBacktick
	LiteralStringBoolean
	LiteralStringChar
	LiteralStringDelimiter
	LiteralStringDoc
	LiteralStringDouble
	LiteralStringEscape
	LiteralStringHeredoc
	LiteralStringInterpol
	LiteralStringName
	LiteralStringOther
	LiteralStringRegex
	LiteralStringSingle
	LiteralStringSymbol
)

Strings.

const (
	LiteralNumber TokenType = 3200 + iota
	LiteralNumberBin
	LiteralNumberFloat
	LiteralNumberHex
	LiteralNumberInteger
	LiteralNumberIntegerLong
	LiteralNumberOct
)

Literals.

const (
	Operator TokenType = 4000 + iota
	OperatorWord
)

Operators.

const (
	Comment TokenType = 6000 + iota
	CommentHashbang
	CommentMultiline
	CommentSingle
	CommentSpecial
)

Comments.

const (
	CommentPreproc TokenType = 6100 + iota
	CommentPreprocFile
)

Preprocessor "comments".

const (
	Generic TokenType = 7000 + iota
	GenericDeleted
	GenericEmph
	GenericError
	GenericHeading
	GenericInserted
	GenericOutput
	GenericPrompt
	GenericStrong
	GenericSubheading
	GenericTraceback
	GenericUnderline
)

Generic tokens.

const (
	Text TokenType = 8000 + iota
	TextWhitespace
	TextSymbol
	TextPunctuation
)

Text.

const (
	Punctuation TokenType = 5000 + iota
)

Punctuation.

func (TokenType) Category

func (t TokenType) Category() TokenType

func (TokenType) Emit

func (t TokenType) Emit(groups []string, _ *LexerState) Iterator

func (TokenType) EmitterKind

func (t TokenType) EmitterKind() string

func (TokenType) InCategory

func (t TokenType) InCategory(other TokenType) bool

func (TokenType) InSubCategory

func (t TokenType) InSubCategory(other TokenType) bool

func (TokenType) MarshalText

func (t TokenType) MarshalText() ([]byte, error)

func (TokenType) MarshalXML

func (t TokenType) MarshalXML(e *xml.Encoder, start xml.StartElement) error

func (TokenType) Parent

func (t TokenType) Parent() TokenType

func (TokenType) String

func (i TokenType) String() string

func (TokenType) SubCategory

func (t TokenType) SubCategory() TokenType

func (*TokenType) UnmarshalText

func (t *TokenType) UnmarshalText(data []byte) error

func (*TokenType) UnmarshalXML

func (t *TokenType) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error

type TokeniseOptions

type TokeniseOptions struct {
	// State to start tokenisation in. Defaults to "root".
	State string
	// Nested tokenisation.
	Nested bool

	// If true, all EOLs are converted into LF
	// by replacing CRLF and CR
	EnsureLF bool
}

TokeniseOptions contains options for tokenisers.

type Trilean

type Trilean uint8

Trilean value for StyleEntry value inheritance.

const (
	Pass Trilean = iota
	Yes
	No
)

Trilean states.

func (Trilean) Prefix

func (t Trilean) Prefix(s string) string

Prefix returns s with "no" as a prefix if Trilean is no.

func (Trilean) String

func (t Trilean) String() string

type TypeMapping

type TypeMapping []struct {
	From, To TokenType
	Words    []string
}

TypeMapping defines type maps for the TypeRemappingLexer.

Directories

Path Synopsis
_tools
svg
Package svg contains an SVG formatter.
Package svg contains an SVG formatter.
Package quick provides simple, no-configuration source code highlighting.
Package quick provides simple, no-configuration source code highlighting.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL