README
ΒΆ
Code Search
A powerful semantic code search tool that combines ripgrep's speed with tree-sitter's code understanding to find and extract complete code blocks based on search patterns.
π Features
- Semantic Code Search: Finds and extracts entire functions, classes, structs, and other code structures rather than just matching lines
- Intelligent Ranking: Ranks results using advanced NLP techniques (TF-IDF, BM25, or hybrid mode)
- Multi-Language Support: Works with Rust, JavaScript, TypeScript, Python, Go, C/C++, Java, Ruby, and PHP
- Smart Extraction: Ensures complete, usable code blocks with proper context
- Dual Mode: Works as a CLI tool or as an MCP server
- Frequency-based Search: Advanced mode with stemming and stopword removal for more accurate results
- AST Parsing: Leverages tree-sitter to understand code structure across languages
π Installation
Prerequisites
- Install Rust and Cargo (if not already installed):
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
From Source
-
Clone the repository:
git clone https://github.com/yourusername/code-search.git cd code-search -
Build the project:
cargo build --release -
(Optional) Install globally:
cargo install --path .
π Basic Usage
CLI Mode
# Basic search
code-search --path <DIRECTORY_PATH> --query <SEARCH_PATTERN>
# Search for "setTools" in the current directory
code-search --path . --query setTools
# Search for "impl" in the src directory
code-search --path ./src --query impl
Output Example
File: src/models.rs
Lines: 10-25
```rust
struct SearchResult {
pub file: String,
pub lines: (usize, usize),
pub node_type: String,
pub code: String,
pub matched_by_filename: Option<bool>,
pub rank: Option<usize>,
pub score: Option<f64>,
pub tfidf_score: Option<f64>,
pub bm25_score: Option<f64>,
pub tfidf_rank: Option<usize>,
pub bm25_rank: Option<usize>,
}
π§ Advanced Usage
Search Modes
# Find files only (no code blocks)
code-search --path . --query search --files-only
# Include files whose names match the query
code-search --path . --query search --include-filenames
# Use frequency-based search with stemming (better for large codebases)
code-search --path . --query search --frequency
Ranking Options
# Use TF-IDF ranking
code-search --path . --query search --reranker tfidf
# Use BM25 ranking
code-search --path . --query search --reranker bm25
# Use hybrid ranking (default)
code-search --path . --query search --reranker hybrid
Search Limits
# Limit to 10 results
code-search --path . --query search --max-results 10
# Limit to 10KB of content
code-search --path . --query search --max-bytes 10240
# Limit to 500 tokens (for AI usage)
code-search --path . --query search --max-tokens 500
Custom Ignore Patterns
# Ignore specific file types
code-search --path . --query search --ignore "*.py" --ignore "*.js"
π Supported Languages
Currently, the tool supports:
- Rust (.rs)
- JavaScript (.js, .jsx)
- TypeScript (.ts, .tsx)
- Python (.py)
- Go (.go)
- C (.c, .h)
- C++ (.cpp, .cc, .cxx, .hpp, .hxx)
- Java (.java)
- Ruby (.rb)
- PHP (.php)
π Architecture
Code Search works in the following way:
- Search: Uses ripgrep to quickly find files containing the search pattern
- Parse: For each matching file, parses it with tree-sitter to build an AST
- Extract: Identifies the smallest code block (function, class, etc.) that contains the match
- Rank: Ranks results using TF-IDF, BM25, or a hybrid approach
- Output: Returns complete, properly formatted code blocks
Components
- CLI Interface: Handles user input and displays results
- Search Engine: Wraps ripgrep for efficient file searching
- Language Parser: Uses tree-sitter for language-specific parsing
- Code Extractor: Identifies code structures in the parse tree
- Result Ranker: Analyzes and sorts results by relevance
π Server Mode
Code Search can also run as an MCP (Model Context Protocol) server that exposes a search_code tool:
# Start the server
code-search server
When running as a server, it implements the ServerHandler trait from the MCP Rust SDK, providing:
initialize- Handles client connection and capabilities negotiationhandle_method- Processes method calls like list_tools, call_tool, etc.shutdown- Handles graceful server shutdown
MCP Tool: search_code
Input schema:
{
"path": "Directory path to search in",
"query": ["Query patterns to search for"],
"files_only": false
}
π§βπ» For Developers
Project Structure
The project is organized into the following directories:
src/- Source code for the applicationsearch/- Code search implementation moduleslanguage.rs- Language-specific parsingmodels.rs- Data structuresranking.rs- Result ranking algorithmscli.rs- Command-line interface
tests/- Test files and utilitiesmocks/- Mock data files for testing
Building and Testing
# Build in debug mode
cargo build
# Build in release mode
cargo build --release
# Run all tests
cargo test
# Run specific test
cargo test test_search_single_term
Adding Support for New Languages
To add support for a new programming language:
-
Add the tree-sitter grammar as a dependency in
Cargo.toml:[dependencies] tree-sitter-newlang = "0.20" -
Update the
get_languagefunction insrc/language.rs:pub fn get_language(extension: &str) -> Option<Language> { match extension { // ... existing languages "nl" => Some(tree_sitter_newlang::language()), _ => None, } } -
Update the
is_acceptable_parentfunction to identify code structures for the language:pub fn is_acceptable_parent(node: &Node, extension: &str) -> bool { let node_type = node.kind(); match extension { // ... existing languages "nl" => { matches!(node_type, "function_declaration" | "class_declaration" | "other_structure" ) }, _ => false, } }
π How It Works
- The tool scans files using ripgrep's highly efficient search algorithm
- For each match, it parses the file with tree-sitter to build an AST
- It identifies the smallest AST node that:
- Contains the matching line
- Represents a complete code block (function, class, struct, etc.)
- Extracts and ranks these code blocks based on relevance to the query
- Presents the results with appropriate formatting and context
π© Troubleshooting
- No matches found: Verify your search pattern and check if there are matches using the regular ripgrep tool
- File parsing errors: Some files may have syntax errors or use language features not supported by the tree-sitter grammar
- Missing code blocks: Update the
is_acceptable_parentfunction insrc/language.rsto support additional node types
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π€ Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request