semtok

package
v0.20.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 28, 2025 License: BSD-3-Clause Imports: 1 Imported by: 0

README

The [LSP](https://microsoft.github.io/language-server-protocol/specifications/specification-3-17/#textDocument_semanticTokens)
specifies semantic tokens as a way of telling clients about language-specific
properties of pieces of code in a file being edited.

The client asks for a set of semantic tokens and modifiers. This note describe which ones
gopls will return, and under what circumstances. Gopls has no control over how the client
converts semantic tokens into colors (or some other visible indication). In vscode it
is possible to modify the color a theme uses by setting the `editor.semanticTokenColorCustomizations`
object. We provide a little [guidance](#Colors) later.

There are 22 semantic tokens, with 10 possible modifiers. The protocol allows each semantic
token to be used with any of the 1024 subsets of possible modifiers, but most combinations
don't make intuitive sense (although `async documentation` has a certain appeal).

The 22 semantic tokens are `namespace`, `type`, `class`, `enum`, `interface`,
		`struct`, `typeParameter`, `parameter`, `variable`, `property`, `enumMember`,
		`event`, `function`, `method`, `macro`, `keyword`, `modifier`, `comment`,
		`string`, `number`, `regexp`, `operator`.

The 10 modifiers are `declaration`, `definition`, `readonly`, `static`,
		`deprecated`, `abstract`, `async`, `modification`, `documentation`, `defaultLibrary`.

The authoritative lists are in the [specification](https://microsoft.github.io/language-server-protocol/specifications/specification-3-17/#semanticTokenTypes)

For the implementation to work correctly the client and server have to agree on the ordering
of the tokens and of the modifiers. Gopls, therefore, will only send tokens and modifiers
that the client has asked for. This document says what gopls would send if the client
asked for everything. By default, vscode asks for everything.

Gopls sends 11 token types for `.go` files and 1 for `.*tmpl` files.
Nothing is sent for any other kind of file.
This all could change. (When Go has generics, gopls will return `typeParameter`.)

For `.*tmpl` files gopls sends `macro`, and no modifiers, for each `{{`...`}}` scope.

## Semantic tokens for Go files

There are two contrasting guiding principles that might be used to decide what to mark
with semantic tokens. All clients already do some kind of syntax marking. E.g., vscode
uses a TextMate grammar. The minimal principle would send semantic tokens only for those
language features that cannot be reliably found without parsing Go and looking at types.
The maximal principle would attempt to convey as much as possible about the Go code,
using all available parsing and type information.

There is much to be said for returning minimal information, but the minimal principle is
not well-specified. Gopls has no way of knowing what the clients know about the Go program
being edited. Even in vscode the TextMate grammars can be more or less elaborate
and change over time. (Nonetheless, a minimal implementation would not return `keyword`,
`number`, `comment`, or `string`.)

The maximal position isn't particularly well-specified either. To chose one example, a
format string might have formatting codes (`%-[4].6f`), escape sequences (`\U00010604`), and regular
characters. Should these all be distinguished? One could even imagine distinguishing
different runes by their Unicode language assignment, or some other Unicode property, such as
being [confusable](http://www.unicode.org/Public/security/10.0.0/confusables.txt). While gopls does not fully adhere to such distinctions,
it does recognizes formatting directives within strings, decorating them with "format" modifiers,
providing more precise semantic highlighting in format strings.

Semantic tokens are returned for identifiers, keywords, operators, comments, and literals.
(Semantic tokens do not cover the file. They are not returned for
white space or punctuation, and there is no semantic token for labels.)
The following describes more precisely what gopls
does, with a few notes on possible alternative choices.
The references to *object* refer to the
```types.Object``` returned by the type checker. The references to *nodes* refer to the
```ast.Node``` from the parser.

1. __`keyword`__ All Go [keywords](https://golang.org/ref/spec#Keywords) are marked `keyword`.
1. __`namespace`__ All package names are marked `namespace`. In an import, if there is an
alias, it would be marked. Otherwise the last component of the import path is marked.
1. __`type`__ Objects of type ```types.TypeName``` are marked `type`. It also reports
a modifier for the top-level constructor of the object's type, one of:
`interface`, `struct`, `signature`, `pointer`, `array`, `map`, `slice`, `chan`, `string`, `number`, `bool`, `invalid`.
1. __`parameter`__ The formal arguments in ```ast.FuncDecl``` and ```ast.FuncType``` nodes are marked `parameter`.
1. __`variable`__  Identifiers in the
scope of ```const``` are modified with `readonly`. ```nil``` is usually a `variable` modified with both
`readonly` and `defaultLibrary`. (```nil``` is a predefined identifier; the user can redefine it,
in which case it would just be a variable, or whatever.) Identifiers of type ```types.Variable``` are,
not surprisingly, marked `variable`. Identifiers being defined (node ```ast.GenDecl```) are modified
by `definition` and, if appropriate, `readonly`. Receivers (in method declarations) are
`variable`.
1. __`method`__ Methods are marked at their definition (```func (x foo) bar() {}```) or declaration
in an ```interface```. Methods are not marked where they are used.
In ```x.bar()```, ```x``` will be marked
either as a `namespace` if it is a package name, or as a `variable` if it is an interface value,
so distinguishing ```bar``` seemed superfluous.
1. __`function`__ Bultins (```types.Builtin```) are modified with `defaultLibrary`
(e.g., ```make```, ```len```, ```copy```). Identifiers whose
object is ```types.Func``` or whose node is ```ast.FuncDecl``` are `function`.
1. __`comment`__ Comments and struct tags. (Perhaps struct tags should be `property`?)
1. __`string`__ Strings. Could add modifiers for e.g., escapes or format codes.
1. __`number`__ Numbers. Should the ```i``` in ```23i``` be handled specially?
1. __`operator`__ Assignment operators, binary operators, ellipses (```...```), increment/decrement
operators, sends (```<-```), and unary operators.

Gopls will send the modifier `deprecated` if it finds a comment
```// deprecated``` in the godoc.

The unused tokens for Go code are `class`, `enum`, `interface`,
		`struct`, `typeParameter`, `property`, `enumMember`,
		`event`, `macro`, `modifier`,
		`regexp`

## Colors

These comments are about vscode.

The documentation has a [helpful](https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide#custom-textmate-scope-mappings)
description of which semantic tokens correspond to scopes in TextMate grammars. Themes seem
to use the TextMate scopes to decide on colors.

Some examples of color customizations are [here](https://medium.com/@danromans/how-to-customize-semantic-token-colorization-with-visual-studio-code-ac3eab96141b).

## Note

While a file is being edited it may temporarily contain either
parsing errors or type errors. In this case gopls cannot determine some (or maybe any)
of the semantic tokens. To avoid weird flickering it is the responsibility
of clients to maintain the semantic token information
in the unedited part of the file, and they do.

Documentation

Overview

The semtok package provides an encoder for LSP's semantic tokens.

Index

Constants

This section is empty.

Variables

TokenModifiers is a slice of modifiers gopls will return as its server capabilities.

TokenTypes is a slice of types gopls will return as its server capabilities.

Functions

func Encode

func Encode(
	tokens []Token,
	encodeType map[Type]bool,
	encodeModifier map[Modifier]bool) []uint32

Encode returns the LSP encoding of a sequence of tokens. encodeType and encodeModifier maps control which types and modifiers are excluded in the response. If a type or modifier maps to false, it will be omitted from the output.

Types

type Modifier added in v0.17.0

type Modifier string
const (
	// LSP 3.18 standard modifiers
	// As with TokenTypes, clients get only the modifiers they request.
	//
	// The section below defines a subset of modifiers in standard modifiers
	// that gopls understand.
	ModDefaultLibrary Modifier = "defaultLibrary" // for predeclared symbols
	ModDefinition     Modifier = "definition"     // for the declaring identifier of a symbol
	ModReadonly       Modifier = "readonly"       // for constants (TokVariable)

	// non-standard modifiers
	//
	// Since the type of a symbol is orthogonal to its kind,
	// (e.g. a variable can have function type),
	// we use modifiers for the top-level type constructor.
	ModArray     Modifier = "array"
	ModBool      Modifier = "bool"
	ModChan      Modifier = "chan"
	ModFormat    Modifier = "format" // for format string directives such as "%s"
	ModInterface Modifier = "interface"
	ModMap       Modifier = "map"
	ModNumber    Modifier = "number"
	ModPointer   Modifier = "pointer"
	ModSignature Modifier = "signature" // for function types
	ModSlice     Modifier = "slice"
	ModString    Modifier = "string"
	ModStruct    Modifier = "struct"
)

type Token

type Token struct {
	Line, Start uint32
	Len         uint32
	Type        Type
	Modifiers   []Modifier
}

A Token provides the extent and semantics of a token.

type Type added in v0.18.0

type Type string
const (
	// These are the tokens defined by LSP 3.18, but a client is
	// free to send its own set; any tokens that the server emits
	// that are not in this set are simply not encoded in the bitfield.
	TokComment   Type = "comment"       // for a comment
	TokFunction  Type = "function"      // for a function
	TokKeyword   Type = "keyword"       // for a keyword
	TokLabel     Type = "label"         // for a control label (LSP 3.18)
	TokMacro     Type = "macro"         // for text/template tokens
	TokMethod    Type = "method"        // for a method
	TokNamespace Type = "namespace"     // for an imported package name
	TokNumber    Type = "number"        // for a numeric literal
	TokOperator  Type = "operator"      // for an operator
	TokParameter Type = "parameter"     // for a parameter variable
	TokString    Type = "string"        // for a string literal
	TokType      Type = "type"          // for a type name (plus other uses)
	TokTypeParam Type = "typeParameter" // for a type parameter
	TokVariable  Type = "variable"      // for a var or const

)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL