Back to

package utf8internal

Latest Go to latest
Published: Apr 25, 2019 | License: BSD-3-Clause | Module:


Package utf8internal contains low-level utf8-related constants, tables, etc. that are used internally by the text package.



const (
	LoCB = 0x80 // 1000 0000
	HiCB = 0xBF // 1011 1111

The default lowest and highest continuation byte.

const (
	// ASCII identifies a UTF-8 byte as ASCII.
	ASCII = as

	// FirstInvalid indicates a byte is invalid as a first byte of a UTF-8
	// sequence.
	FirstInvalid = xx

	// SizeMask is a mask for the size bits. Use use x&SizeMask to get the size.
	SizeMask = 7

	// AcceptShift is the right-shift count for the first byte info byte to get
	// the index into the AcceptRanges table. See AcceptRanges.
	AcceptShift = 4

Constants related to getting information of first bytes of UTF-8 sequences.


var AcceptRanges = [...]AcceptRange{
	0: {LoCB, HiCB},
	1: {0xA0, HiCB},
	2: {LoCB, 0x9F},
	3: {0x90, HiCB},
	4: {LoCB, 0x8F},

AcceptRanges is a slice of AcceptRange values. For a given byte sequence b


will give the value of AcceptRange for the multi-byte UTF-8 sequence starting at b[0].

var First = [256]uint8{ /* 256 elements not displayed */


First is information about the first byte in a UTF-8 sequence.

type AcceptRange

type AcceptRange struct {
	Lo uint8 // lowest value for second byte.
	Hi uint8 // highest value for second byte.

AcceptRange gives the range of valid values for the second byte in a UTF-8 sequence for any value for First that is not ASCII or FirstInvalid.

Documentation was rendered with GOOS=linux and GOARCH=amd64.

Jump to identifier

Keyboard shortcuts

? : This menu
f or F : Jump to identifier