Documentation
¶
Overview ¶
Package lex tokenises PDF input. It deals with the lexical layer of PDF objects — whitespace, comments, names, numbers, strings, arrays, dictionaries, the stream/endstream/obj/endobj/R/null/true/false keywords — but does not assemble higher-level structures. The parser layered above it turns token streams into Object trees.
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ErrUnexpectedEOF = errors.New("pdfdisassembler/lex: unexpected EOF")
ErrUnexpectedEOF indicates that the lexer ran out of bytes mid-token.
Functions ¶
func IsDelimiter ¶
IsDelimiter reports whether c is a PDF delimiter character (§7.2.2).
func IsRegular ¶
IsRegular reports whether c is a regular character (neither whitespace nor delimiter).
func IsWhitespace ¶
IsWhitespace reports whether c is a PDF whitespace character (§7.2.2).
Types ¶
type Kind ¶
type Kind int
Kind identifies a token's lexical category.
const ( // EOF marks end of input. EOF Kind = iota // Name is a PDF name without the leading slash. Name // Integer is a literal integer (no decimal point, optional sign). Integer // Real is a literal real number (has a decimal point or 'e' exponent — // PDF does not actually allow exponents but we accept them). Real // LitString is a parenthesised literal string with escapes already // resolved. LitString // HexString is an angle-bracketed hex string with hex pairs already // decoded to bytes. HexString // ArrayStart is the '[' token. ArrayStart // ArrayEnd is the ']' token. ArrayEnd // DictStart is the '<<' token. DictStart // DictEnd is the '>>' token. DictEnd // Keyword is any unquoted identifier: true, false, null, obj, endobj, // stream, endstream, R, xref, trailer, startxref, n, f. Keyword )
type Lexer ¶
type Lexer struct {
// contains filtered or unexported fields
}
Lexer converts a byte slice into a stream of Tokens. It is not safe for concurrent use.
func (*Lexer) ReadStreamData ¶
ReadStreamData consumes raw stream bytes of the given length, starting at the current position. It honours the spec's EOL handling: a single LF or CRLF *immediately* after the "stream" keyword is part of the keyword line, not the stream content. Callers should call this after the "stream" keyword token has been consumed.
func (*Lexer) SkipWhitespace ¶
func (l *Lexer) SkipWhitespace()
SkipWhitespace advances over PDF whitespace and comments.
type Token ¶
type Token struct {
Kind Kind
Bytes []byte
Offset int64 // byte offset in the input where this token started
}
Token is a single lexical unit. Bytes carries the token payload; its meaning depends on Kind:
- Name, Keyword: ASCII name body, no leading slash
- Integer, Real: literal digits
- LitString, HexString: decoded bytes
- ArrayStart, ArrayEnd, DictStart, DictEnd, EOF: empty