Documentation
¶
Overview ¶
Implements the lowest level of processing of PS/PDF files. The tokenizer is also usable with Type1 font files. See the higher level package pdf/parser to read PDF objects.
Index ¶
- func IsAsciiWhitespace(ch byte) bool
- func IsHexChar(c byte) (uint8, bool)
- type Fl
- type Kind
- type Token
- type Tokenizer
- func (pr Tokenizer) Bytes() []byte
- func (pr Tokenizer) CurrentPosition() int
- func (pr Tokenizer) HasEOLBeforeToken() bool
- func (pr Tokenizer) IsEOF() bool
- func (pr *Tokenizer) NextToken() (Token, error)
- func (pr Tokenizer) PeekPeekToken() (Token, error)
- func (pr Tokenizer) PeekToken() (Token, error)
- func (tk *Tokenizer) Reset(data []byte)
- func (tk *Tokenizer) ResetFromReader(src io.Reader)
- func (tk *Tokenizer) SetPosition(pos int)
- func (pr *Tokenizer) SkipBytes(n int) []byte
- func (pr *Tokenizer) StreamPosition() int
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func IsAsciiWhitespace ¶
IsAsciiWhitespace returns true if `ch` is one of the ASCII whitespaces.
Types ¶
type Kind ¶
type Kind uint8
Kind is the kind of token.
const ( EOF Kind Float Integer String StringHex Name StartArray EndArray StartDic EndDic Other // include commands in content stream StartProc // only valid in PostScript files EndProc // idem CharString // PS only: binary stream, introduce by and integer and a RD or -| command )
type Token ¶
type Token struct {
// Additional value found in the data
// Note that it is a copy of the source bytes.
Value []byte
Kind Kind
}
Token represents a basic piece of information. `Value` must be interpreted according to `Kind`, which is left to parsing packages.
func Tokenize ¶
Tokenize consume all the input, splitting it into tokens. When performance matters, you should use the iteration method `NextToken` of the Tokenizer type.
func (Token) Int ¶
Int returns the integer value of the token, also accepting float values and rouding them.
type Tokenizer ¶
type Tokenizer struct {
// contains filtered or unexported fields
}
Tokenizer is a PS/PDF tokenizer.
It handles PS features like Procs and CharStrings: strict parsers should check for such tokens and return an error if needed.
Comments are ignored.
The tokenizer can't handle streams and inline image data on it's own.
Regarding exponential numbers: 7.3.3 Numeric Objects: A conforming writer shall not use the PostScript syntax for numbers with non-decimal radices (such as 16#FFFE) or in exponential format (such as 6.02E23). Nonetheless, we sometimes get numbers with exponential format, so we support it in the tokenizer (no confusion with other types, so no compromise).
func NewTokenizer ¶
NewTokenizer returns a tokenizer working on the given input.
func NewTokenizerFromReader ¶
NewTokenizerFromReader supports tokenizing an input stream, without knowing its length. The tokenizer will call Read and buffer the data. The error from the io.Read method is discarded: the internal buffer is simply not grown. See `SetPosition`, `SkipBytes` and `Bytes` for more information of the behavior in this mode.
func (Tokenizer) Bytes ¶
Bytes return a slice of the bytes, starting from the current position. When using an io.Reader, only the current internal buffer is returned.
func (Tokenizer) CurrentPosition ¶
CurrentPosition return the position in the input. It may be used to go back if needed, using `SetPosition`.
func (Tokenizer) HasEOLBeforeToken ¶
HasEOLBeforeToken checks if EOL happens before the next token.
func (*Tokenizer) NextToken ¶
NextToken reads a token and advances (consuming the token). If EOF is reached, no error is returned, but an `EOF` token.
func (Tokenizer) PeekPeekToken ¶
PeekPeekToken reads the token after the next but does not advance the position. It returns a cached value, meaning it is a very cheap call.
func (Tokenizer) PeekToken ¶
PeekToken reads a token but does not advance the position. It returns a cached value, meaning it is a very cheap call. If the error is not nil, the return Token is garranteed to be zero.
func (*Tokenizer) Reset ¶ added in v1.0.1
Reset allow to re-use the internal buffers allocated by the tokenizer.
func (*Tokenizer) ResetFromReader ¶ added in v1.0.1
ResetFromReader allow to re-use the internal buffers allocated by the tokenizer.
func (*Tokenizer) SetPosition ¶
SetPosition set the position of the tokenizer in the input data.
Most of the time, `NextToken` should be preferred, but this method may be used for example to go back to a saved position.
When using an io.Reader as source, no additional buffering is performed.
func (*Tokenizer) SkipBytes ¶
SkipBytes skips the next `n` bytes and return them. This method is useful to handle inline data. If `n` is too large, it will be truncated: no additional buffering is done.
func (*Tokenizer) StreamPosition ¶
StreamPosition returns the position of the begining of a stream, taking into account white spaces. See 7.3.8.1 - General