lexer

package
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 7, 2024 License: MIT Imports: 7 Imported by: 0

Documentation

Overview

Copyright © 2024 AntoninoAdornetto

The c.go file is responsible for satisfying the `LexicalTokenizer` interface in the `lexer.go` file. The methods are a strict set of rules for handling single & multi line comments for c-like languages. The result, if an issue annotation is located, is a slice of tokens that will provide information about the action item contained in the comment. If a comment does not contain an issue annotation, all subsequent tokens of the remaining comment bytes will be ignored and removed from the `DraftTokens` slice.

The lexer.go file is responsible for creating a `Base` Lexer, consuming and iterating through bytes of source code, and determining which `Target` Lexer to use for the Tokenization process.

Base Lexer: The name Lexer may be a bit misleading for the Base Lexer. There is no strict rule set baked into the receiver methods. However, the `Base` Lexer has a very important role of sharing byte consumption methods to `Target` Lexers. For example, we don't want to re-write .next(), .peek() or .nextLexeme() multiple times for Target Lexers since the logic for said methods are not specific to the Target Lexer and won't change.

Target Lexer: Simply put, a `Target` Lexer is the Lexer that handles the Tokenization rule set. For this application, we are only concerned with creating single and multi line comments. More specifically, we are concerned with single and multi line comments that contain an issue annotation.

`Target` Lexers are created via the `NewTargetLexer` method. The `Base` Lexer is passed to the function, via dependency injection, as input and is stored within each `Target` Lexer so that targets can access the shared byte consumption methods. `Target` Lexers must satisfy the methods contained in the `LexicalTokenizer` interface. I know I mentioned we are only concerned with Comments in source code but you will notice a requirement for a `String` method in the interface. We must account for strings to combat an edge case. Let me explain, if we are lexing a python string that contains a hash character "#" (comment notation symbol), our lexer could very well- explode. Same could be said for c or go strings that contain 1 or more forward slashes "/". String tokens are not persisted, just consumed until the closing delimiter is located.

Lastly, it's important to mention how `Target` Lexers are created. When instantiating a new `Base` Lexer, the src code file path is provided. This path is utilized to read the base file extension. If the file extension is .c, .go, .cpp, .h ect, then we would return a Target Lexer that supports c-like comment syntax since they all denote single and multi line comments with the same notation. For .py files, we would return a PythonLexer and so on.

Index

Constants

View Source
const (
	ASTERISK       byte = '*'
	BACK_TICK      byte = '`'
	BACKWARD_SLASH byte = '\\'
	FORWARD_SLASH  byte = '/'
	HASH           byte = '#'
	QUOTE          byte = '\''
	DOUBLE_QUOTE   byte = '"'
	NEWLINE        byte = '\n'
	TAB            byte = '\t'
	OPEN_PARAN     byte = '('
	CLOSE_PARAN    byte = ')'
	WHITESPACE     byte = ' '
)

Variables

This section is empty.

Functions

This section is empty.

Types

type Clexer added in v1.0.0

type Clexer struct {
	Base        *Lexer  // holds shared byte consumption methods
	DraftTokens []Token // Unvalidated tokens
	// contains filtered or unexported fields
}

func (*Clexer) AnalyzeToken added in v1.0.0

func (c *Clexer) AnalyzeToken() error

func (*Clexer) Comment added in v1.0.0

func (c *Clexer) Comment() error

func (*Clexer) String added in v1.0.0

func (c *Clexer) String(delim byte) error

@TEST_TODO Test the CLexer String func

type Comment

type Comment struct {
	TokenAnnotationIndex int
	Title, Description   string
	TokenStartIndex      int   // location of the first token
	TokenEndIndex        int   // location of the last token
	AnnotationPos        []int // start/end index of the annotation
	IssueNumber          int   // will contain a non 0 value if the comment has been reported
	LineNumber           int
	NotationStartIndex   int // index of where the comment starts
	NotationEndIndex     int // index of where the comment ends
}

type CommentManager added in v1.0.0

type CommentManager struct {
	Comments []Comment
}

func BuildComments added in v1.0.0

func BuildComments(tokens []Token) (CommentManager, error)

type Lexer

type Lexer struct {
	FilePath   string
	FileName   string
	Src        []byte  // source code bytes
	Tokens     []Token // comment tokens after lexical analysis has been complete
	Start      int     // byte index
	Current    int     // byte index, used in conjunction with Start to construct tokens
	Line       int     // Line number
	Annotation []byte  // issue annotation to search for within comments
	// contains filtered or unexported fields
}

func NewLexer

func NewLexer(annotation, src []byte, filePath string, flags U8) *Lexer

func (*Lexer) AnalyzeTokens

func (base *Lexer) AnalyzeTokens(target LexicalTokenizer) ([]Token, error)

type LexicalTokenizer added in v1.0.0

type LexicalTokenizer interface {
	AnalyzeToken() error
	String(delim byte) error
	Comment() error
	// contains filtered or unexported methods
}

AnalyzeToken - checks the current byte from [Lexer.peek()] and determines how we should process the proceeding bytes String - tokens from the string method are not stored. It's needed to prevent lexing comment notation within a string Comment - The bread and butter of our target lexers. Handles processing single & multi line comments processLexeme - transforms the lexeme into a token and appends it to the draft tokens contained in the target lexer struct

func NewTargetLexer added in v1.0.0

func NewTargetLexer(base *Lexer) (LexicalTokenizer, error)

type ShellLexer added in v1.2.0

type ShellLexer struct {
	Base        *Lexer  // holds shared byte consumption methods
	DraftTokens []Token // Unvalidated tokens
	// contains filtered or unexported fields
}

func (*ShellLexer) AnalyzeToken added in v1.2.0

func (sh *ShellLexer) AnalyzeToken() error

func (*ShellLexer) Comment added in v1.2.0

func (sh *ShellLexer) Comment() error

func (*ShellLexer) String added in v1.2.0

func (sh *ShellLexer) String(delim byte) error

type Token

type Token struct {
	Type   TokenType
	Lexeme []byte // token value
	Line   int    // Line number
	Start  int    // Starting byte index of the token in Lexer Src slice
	End    int    // Ending byte index of the token in Lexer Src slice
}

func NewToken added in v1.0.0

func NewToken(tokenType TokenType, lexeme []byte, lexer *Lexer) Token

type TokenType

type TokenType = uint16
const (
	TOKEN_SINGLE_LINE_COMMENT_START TokenType = 1 << iota
	TOKEN_SINGLE_LINE_COMMENT_END
	TOKEN_MULTI_LINE_COMMENT_START
	TOKEN_MULTI_LINE_COMMENT_END
	TOKEN_COMMENT_ANNOTATION
	TOKEN_ISSUE_NUMBER
	TOKEN_COMMENT_TITLE
	TOKEN_COMMENT_DESCRIPTION
	TOKEN_SINGLE_LINE_COMMENT
	TOKEN_MULTI_LINE_COMMENT
	TOKEN_OPEN_PARAN
	TOKEN_CLOSE_PARAN
	TOKEN_HASH
	TOKEN_UNKNOWN
	TOKEN_EOF
)

type U8 added in v1.0.0

type U8 uint8
const (
	FLAG_PURGE U8 = 1 << iota
	FLAG_SCAN
)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL