tokenizer

package module

v1.0.1 Latest Latest Go to latest Published: Nov 7, 2021 License: MIT Imports: 5 Imported by: 6

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/benoitkugler/pstokenizer

Links

Open Source Insights

Documentation ¶

Overview ¶

Implements the lowest level of processing of PS/PDF files. The tokenizer is also usable with Type1 font files. See the higher level package pdf/parser to read PDF objects.

Index ¶

func IsAsciiWhitespace(ch byte) bool
func IsHexChar(c byte) (uint8, bool)
type Fl
type Kind
- func (k Kind) String() string
type Token
- func Tokenize(data []byte) ([]Token, error)
type Tokenizer
- func NewTokenizer(data []byte) *Tokenizer
- func NewTokenizerFromReader(src io.Reader) *Tokenizer

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func IsAsciiWhitespace ¶

func IsAsciiWhitespace(ch byte) bool

IsAsciiWhitespace returns true if `ch` is one of the ASCII whitespaces.

func IsHexChar ¶

func IsHexChar(c byte) (uint8, bool)

IsHexChar converts a hex character into its value and a success flag (see encoding/hex for details).

Types ¶

type Fl ¶

type Fl = float64

type Kind ¶

type Kind uint8

Kind is the kind of token.

const (
	EOF Kind
	Float
	Integer
	String
	StringHex
	Name
	StartArray
	EndArray
	StartDic
	EndDic
	Other // include commands in content stream

	StartProc  // only valid in PostScript files
	EndProc    // idem
	CharString // PS only: binary stream, introduce by and integer and a RD or -| command
)

func (Kind) String ¶

func (k Kind) String() string

type Token ¶

type Token struct {
	// Additional value found in the data
	// Note that it is a copy of the source bytes.
	Value []byte
	Kind  Kind
}

Token represents a basic piece of information. `Value` must be interpreted according to `Kind`, which is left to parsing packages.

func Tokenize ¶

func Tokenize(data []byte) ([]Token, error)

Tokenize consume all the input, splitting it into tokens. When performance matters, you should use the iteration method `NextToken` of the Tokenizer type.

func (Token) Float ¶

func (t Token) Float() (Fl, error)

Float returns the float value of the token.

func (Token) Int ¶

func (t Token) Int() (int, error)

Int returns the integer value of the token, also accepting float values and rouding them.

func (Token) IsNumber ¶

func (t Token) IsNumber() bool

IsNumber returns `true` for integers and floats.

func (Token) IsOther ¶

func (t Token) IsOther(value string) bool

IsOther return true if it has `Other` kind, with the given value

type Tokenizer ¶

type Tokenizer struct {
	// contains filtered or unexported fields
}

Tokenizer is a PS/PDF tokenizer.

It handles PS features like Procs and CharStrings: strict parsers should check for such tokens and return an error if needed.

Comments are ignored.

The tokenizer can't handle streams and inline image data on it's own.

Regarding exponential numbers: 7.3.3 Numeric Objects: A conforming writer shall not use the PostScript syntax for numbers with non-decimal radices (such as 16#FFFE) or in exponential format (such as 6.02E23). Nonetheless, we sometimes get numbers with exponential format, so we support it in the tokenizer (no confusion with other types, so no compromise).

func NewTokenizer ¶

func NewTokenizer(data []byte) *Tokenizer

NewTokenizer returns a tokenizer working on the given input.

func NewTokenizerFromReader ¶

func NewTokenizerFromReader(src io.Reader) *Tokenizer

NewTokenizerFromReader supports tokenizing an input stream, without knowing its length. The tokenizer will call Read and buffer the data. The error from the io.Read method is discarded: the internal buffer is simply not grown. See `SetPosition`, `SkipBytes` and `Bytes` for more information of the behavior in this mode.

func (Tokenizer) Bytes ¶

func (pr Tokenizer) Bytes() []byte

Bytes return a slice of the bytes, starting from the current position. When using an io.Reader, only the current internal buffer is returned.

func (Tokenizer) CurrentPosition ¶

func (pr Tokenizer) CurrentPosition() int

CurrentPosition return the position in the input. It may be used to go back if needed, using `SetPosition`.

func (Tokenizer) HasEOLBeforeToken ¶

func (pr Tokenizer) HasEOLBeforeToken() bool

HasEOLBeforeToken checks if EOL happens before the next token.

func (Tokenizer) IsEOF ¶

func (pr Tokenizer) IsEOF() bool

func (*Tokenizer) NextToken ¶

func (pr *Tokenizer) NextToken() (Token, error)

NextToken reads a token and advances (consuming the token). If EOF is reached, no error is returned, but an `EOF` token.

func (Tokenizer) PeekPeekToken ¶

func (pr Tokenizer) PeekPeekToken() (Token, error)

PeekPeekToken reads the token after the next but does not advance the position. It returns a cached value, meaning it is a very cheap call.

func (Tokenizer) PeekToken ¶

func (pr Tokenizer) PeekToken() (Token, error)

PeekToken reads a token but does not advance the position. It returns a cached value, meaning it is a very cheap call. If the error is not nil, the return Token is garranteed to be zero.

func (*Tokenizer) Reset ¶ added in v1.0.1

func (tk *Tokenizer) Reset(data []byte)

Reset allow to re-use the internal buffers allocated by the tokenizer.

func (*Tokenizer) ResetFromReader ¶ added in v1.0.1

func (tk *Tokenizer) ResetFromReader(src io.Reader)

ResetFromReader allow to re-use the internal buffers allocated by the tokenizer.

func (*Tokenizer) SetPosition ¶

func (tk *Tokenizer) SetPosition(pos int)

SetPosition set the position of the tokenizer in the input data.

Most of the time, `NextToken` should be preferred, but this method may be used for example to go back to a saved position.

When using an io.Reader as source, no additional buffering is performed.

func (*Tokenizer) SkipBytes ¶

func (pr *Tokenizer) SkipBytes(n int) []byte

SkipBytes skips the next `n` bytes and return them. This method is useful to handle inline data. If `n` is too large, it will be truncated: no additional buffering is done.

func (*Tokenizer) StreamPosition ¶

func (pr *Tokenizer) StreamPosition() int

StreamPosition returns the position of the begining of a stream, taking into account white spaces. See 7.3.8.1 - General

Source Files ¶

View all Source files

token.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL