PHP Parser

A high-performance, comprehensive PHP parser implementation in Go with full PHP 8.4 syntax support. This parser provides complete lexical analysis, syntax parsing, and AST generation capabilities for PHP code.
δΈζζζ‘£ | Documentation | Examples
Features
π Core Capabilities
- Full PHP 8.4 Compatibility: Complete syntax support including modern PHP features
- High Performance: Optimized lexer and parser with benchmark support
- Complete AST: Rich Abstract Syntax Tree with 150+ node types
- Error Recovery: Robust error handling with partial parsing capabilities
- Position Tracking: Precise line/column information for all tokens and nodes
- Visitor Pattern: Comprehensive AST traversal and transformation utilities
π¦ Components
- Lexer: State-machine based tokenizer with 11 parsing states
- Parser: Recursive descent parser with Pratt parsing for expressions
- AST: Interface-based node system with visitor pattern support
- CLI Tool: Feature-rich command-line interface for parsing and analysis
- Examples: 5 comprehensive examples demonstrating different use cases
π― Use Cases
- Static Analysis: Code quality tools, linters, security scanners
- Development Tools: IDEs, language servers, code formatters
- Documentation: API documentation generators, code visualization
- Refactoring: Automated code transformation and migration tools
- Testing: Code coverage analysis, mutation testing
- Transpilation: PHP-to-PHP transformations, version compatibility
Quick Start
Installation
go get github.com/wudi/php-parser
Basic Usage
package main
import (
"fmt"
"github.com/wudi/php-parser/lexer"
"github.com/wudi/php-parser/parser"
)
func main() {
code := `<?php
function hello($name) {
return "Hello, " . $name . "!";
}
echo hello("World");
?>`
l := lexer.New(code)
p := parser.New(l)
program := p.ParseProgram()
if len(p.Errors()) > 0 {
for _, err := range p.Errors() {
fmt.Printf("Error: %s\n", err)
}
return
}
fmt.Printf("Parsed %d statements\n", len(program.Body))
fmt.Printf("AST: %s\n", program.String())
}
Build and use the CLI tool:
# Build the parser
go build -o php-parser ./cmd/php-parser
# Parse a PHP file
./php-parser example.php
# Show tokens and AST
./php-parser -tokens -ast example.php
# Output as JSON
./php-parser -format json example.php
# Parse from stdin
echo '<?php echo "Hello"; ?>' | ./php-parser
Architecture
Project Structure
php-parser/
βββ ast/ # Abstract Syntax Tree implementation
β βββ node.go # 150+ AST node types (6K+ lines)
β βββ kind.go # AST node type constants
β βββ visitor.go # Visitor pattern utilities
β βββ builder.go # AST construction helpers
βββ lexer/ # Lexical analyzer
β βββ lexer.go # Main lexer with state machine (1.5K+ lines)
β βββ token.go # PHP token definitions (150+ tokens)
β βββ states.go # Lexer state management
βββ parser/ # Syntax parser
β βββ parser.go # Recursive descent parser (7K+ lines)
β βββ pool.go # Parser pooling for concurrency
β βββ testdata/ # Test cases and fixtures
βββ cmd/ # Command-line interface
β βββ php-parser/ # CLI implementation (244 lines)
βββ examples/ # Usage examples and tutorials
β βββ basic-parsing/ # Fundamental parsing concepts
β βββ ast-visitor/ # Visitor pattern examples
β βββ token-extraction/ # Lexical analysis
β βββ error-handling/ # Error recovery examples
β βββ code-analysis/ # Static analysis tools
βββ errors/ # Error handling utilities
βββ scripts/ # Development and testing scripts
Code Statistics: 18,500+ lines of Go code across 16 source files with 29 test files
Core Components
Lexer (Tokenizer)
- 11 Parsing States: Including
ST_IN_SCRIPTING, ST_DOUBLE_QUOTES, ST_HEREDOC
- 150+ Token Types: Complete PHP 8.4 token compatibility
- State Machine: Handles complex PHP syntax like string interpolation
- Shebang Support: Recognizes executable PHP files
- Position Tracking: Line, column, and offset information
Parser (Syntax Analyzer)
- Recursive Descent: Clean, maintainable parsing architecture
- Pratt Parsing: Elegant operator precedence handling (14 levels)
- Error Recovery: Continues parsing after errors to find multiple issues
- 50+ Parse Functions: Comprehensive PHP syntax coverage
- Alternative Syntax: Full support for
if:...endif; style constructs
AST (Abstract Syntax Tree)
- Interface-Based: Clean separation between node types
- 150+ Node Types: Matching PHP's official
zend_ast.h constants
- Visitor Pattern: Easy traversal and transformation
- JSON Serialization: Export AST for external tools
- Position Preservation: All nodes retain source location
PHP 8.4 Language Support
Operators
- Arithmetic:
+, -, *, /, %, ** (power)
- Assignment:
=, +=, -=, *=, /=, %=, **=, .=, ??=, etc.
- Comparison:
==, ===, !=, !==, <, <=, >, >=, <=> (spaceship)
- Logical:
&&, ||, !, and, or, xor
- Bitwise:
&, |, ^, ~, <<, >>
- Null Coalescing:
??, ??=
Language Constructs
- Control Structures:
if, else, elseif, while, for, foreach, switch
- Alternative Syntax:
if:...endif;, while:...endwhile;, switch:...endswitch;
- Functions: Parameters, return types, references, variadic
- Classes: Properties, methods, constants, inheritance, interfaces
- Namespaces: Full namespace support with
use statements
- Special:
__halt_compiler(), match expressions, attributes
Modern PHP Features
- Typed Properties:
private int $id, public ?string $name
- Union Types:
int|string, Foo|Bar|null
- Intersection Types:
Foo&Bar
- Match Expressions: Pattern matching with
match()
- Attributes:
#[Route('/api')] syntax
- Nullsafe Operator:
$user?->getProfile()?->getName()
- Class Constants with Visibility:
private const SECRET = 'value'
Examples
The examples/ directory contains comprehensive demonstrations:
Learn fundamental parsing concepts and AST examination.
cd examples/basic-parsing && go run main.go
Implement custom visitors for code analysis and traversal.
cd examples/ast-visitor && go run main.go
Explore lexical analysis and token statistics.
cd examples/token-extraction && go run main.go
Understand parser error recovery and reporting.
cd examples/error-handling && go run main.go
Build static analysis tools with metrics and quality assessment.
cd examples/code-analysis && go run main.go
Each example includes:
- Runnable Code: Complete working examples
- Documentation: Detailed README with explanations
- Real PHP Samples: Realistic code for testing
- Progressive Complexity: From beginner to advanced
Testing
Running Tests
# Run all tests
go test ./...
# Run with verbose output
go test ./... -v
# Run specific component tests
go test ./lexer -v
go test ./parser -v
go test ./ast -v
# Run benchmarks
go test ./parser -bench=.
go test ./parser -bench=. -benchmem
# Run specific test cases
go test ./parser -run=TestParsing_TryCatchWithStatements
go test ./parser -run=TestParsing_ClassMethodsWithVisibility
Test Coverage
The project maintains comprehensive test coverage with:
- 29 Test Files: Covering all major components
- 200+ Test Cases: Including edge cases and error conditions
- Real-world Testing: Compatibility tests against major PHP frameworks
- Benchmark Tests: Performance validation and optimization
Compatibility Testing
Test against popular PHP codebases:
# Test with WordPress
go run scripts/test_folder.go /path/to/wordpress-develop
# Test with Laravel
go run scripts/test_folder.go /path/to/framework
# Test with Symfony
go run scripts/test_folder.go /path/to/symfony
Benchmarks
- Simple Statements: ~1.6ΞΌs per parse
- Complex Expressions: ~4ΞΌs per parse
- Large Files: Efficient memory usage with streaming
- Concurrent Parsing: Parser pool support for parallel processing
Memory Usage
- Low Footprint: Minimal memory allocation per parse
- Streaming: Large file support without loading entire content
- Pool Pattern: Reusable parser instances for server applications
Development
Requirements
- Go 1.21+
- Optional: PHP 8.4 for compatibility testing
Development Commands
# Build CLI tool
go build -o php-parser ./cmd/php-parser
# Run all tests
go test ./...
# Run performance benchmarks
go test ./parser -bench=. -run=^$
# Test with memory profiling
go test ./parser -bench=. -benchmem
# Compatibility testing
go run scripts/test_folder.go /path/to/php/codebase
# Clean build artifacts
go clean
rm -f php-parser main
Contributing
We welcome contributions! Please see our contributing guidelines:
- Fork the repository
- Create a feature branch:
git checkout -b feature-name
- Write tests for your changes
- Ensure all tests pass:
go test ./...
- Run linting:
go fmt ./...
- Submit a pull request
Development Guidelines
- Follow Go coding standards (
gofmt, effective Go)
- Maintain PHP compatibility with official implementation
- Add comprehensive tests for new features
- Reference PHP's official grammar at
/php-src/Zend/zend_language_parser.y
- Update documentation for new features
Use Cases in Production
- Code Quality: Detect anti-patterns, complexity metrics
- Security Scanning: Find vulnerabilities, injection risks
- Style Checking: Enforce coding standards and conventions
- IDEs: Syntax highlighting, auto-completion, error detection
- Language Servers: LSP implementation for editor support
- Code Formatters: Consistent code styling and formatting
- API Generators: Extract documentation from code and comments
- Code Visualization: Generate class diagrams, call graphs
- Dependency Analysis: Track code relationships and coupling
Migration and Refactoring
- Version Upgrades: PHP version compatibility transformations
- Framework Migrations: Automated code pattern updates
- Code Modernization: Apply modern PHP practices and syntax
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- PHP Team: For the official PHP language specification
- Go Community: For excellent tooling and ecosystem
- Contributors: Everyone who has contributed to this project
Links
Built with β€οΈ in Go | PHP 8.4 Compatible | Production Ready