README
¶
llm
A CLI tool for using local and remote LLMs directly from the shell, with streaming, interactivity, and OpenRouter-style reasoning token support.
go install github.com/kir-gadjello/llm@latest
Usage
llm "your message"
llm -p "system prompt" "user message"
llm -C "user message" # start chat and send immediately
echo "data" | llm "analyze this"
llm -c # interactive chat
llm history # browse recent chats
llm search "query" # search history
Session Mode
Wrap your shell in an AI harness. Type ?? before any question to invoke the LLM with full terminal context:
llm session
# or: llm --session
?? why did that last command fail
?? show me all git branches sorted by date
The LLM receives structured command history with outputs and exit codes. For better context tracking, enable shell integration:
# Add to ~/.zshrc, ~/.bashrc, or ~/.config/fish/config.fish
source <(llm integration zsh) # or bash, fish
Shell integration uses OSC 133 sequences to parse command boundaries, providing clean structured history instead of raw terminal output.
History & Session Management
Search History Query your local conversation database using full-text search (requires FTS5 support):
llm search "database migration"
llm search "user:optimization" # Filter by role
Resume Session Continue a conversation from a specific session UUID (found via search):
llm resume <uuid> "continue explaining the previous point"
Shell Assistant
Generate and execute shell commands using natural language. Detects your shell and OS for precise syntax:
llm -s "find all typescript files excluding node_modules"
# Generates: find . -type f -name "*.ts" ! -path "*/node_modules/*"
Interactive Menu
- Execute: Run the command immediately
- Revise: Refine with more natural language
- Describe: Get a detailed explanation
- Copy: Copy to clipboard
YOLO Mode Skip confirmation for automation:
llm -s -y "git branch --show-current"
Shell History Context Include recent commands for better context-aware generation:
llm -s -H "undo that" # includes last 20 commands
llm -s -H5 "fix the error" # last 5 commands
Reasoning Models
Control reasoning token generation for models that support it (OpenAI o-series, Grok, etc.):
llm -m o1 --reasoning-high "explain quantum entanglement"
llm -m grok-2 -n "simple task" # disable reasoning
llm -R2048 "complex analysis" # specific token budget
llm --reasoning-exclude # use reasoning but exclude it from output
Clipboard Integration
pbcopy < file.txt && llm -x "review this"
llm -x --context-order append "context after prompt"
Image Preview in Terminal
If your terminal supports inline images (iTerm2, WezTerm, Kitty, Alacritty, Windows Terminal), the CLI can preview images sent to the model for confirmation. This is enabled by default and can be disabled with --no-image-log.
# Preview included images (e.g., via -f)
llm -f diagram.png "explain this diagram"
# Disable preview
llm -f diagram.png --no-image-log "explain this diagram"
- Behavior:
- Automatic detection via
ITERM_SESSION_ID,TERM_PROGRAM, andTERMheuristics for iTerm2, WezTerm, Kitty, Alacritty, and Windows Terminal. - Unsupported terminals will silently skip preview (no errors).
- Resizes images to max height 400px while preserving aspect ratio before display.
Context Formatting
Control how files are presented to the model:
# Show relative paths (default)
llm -f main.go
# Hide filenames
llm -f main.go --show-filenames none
# Use XML format instead of Markdown
llm -f main.go --context-format xml
Constrained Generation
JSON schema support for llama.cpp and compatible backends:
llm -J '{"type": "string", "enum": ["yes", "no"]}' "is pi > e"
Piped Input
By default, piped stdin content is wrapped with <context> tags for clarity:
cat file.txt | llm "summarize this"
# Sends: <context>\n{file contents}\n</context>\n\nsummarize this
# Use custom wrapper tag
echo "data" | llm -w "input" "analyze"
# Disable wrapping
echo "data" | llm -w "" "analyze"
File Context & Auto-Selection
Include code files and let the LLM intelligently select relevant files based on your query.
Manual File Selection
# Include specific files
llm -f src/main.go -f lib/utils.go "explain the architecture"
# Include directories (walks and loads all files)
llm -f src/ "review the codebase"
# Use glob patterns
llm -f "src/**/*.go" "find all error handling"
Git Integration with @ Syntax
Reference files from git context directly in your prompt:
# Review staged changes
llm "@staged review these changes before commit"
# Analyze uncommitted modifications
llm "@dirty what bugs might these changes introduce"
# Examine last commit
llm "@last explain what this commit does"
# Reference specific files with @ prefix
llm "@src/main.go @lib/config.go how do these interact"
# Combine git aliases with manual files
llm -f README.md "@staged document these changes"
Git Aliases:
@staged- Files in git staging area (git diff --name-only --cached)@dirty- Modified but unstaged files (git diff --name-only)@last- Files changed in last commit (git diff-tree --no-commit-id --name-only -r HEAD)
Auto-Selection Mode (-A)
Let the LLM automatically select relevant files using a repository map:
# Auto-select files based on query
llm -A "find the session parser implementation"
# Combine auto-selection with manual files
llm -A -f config.yaml "how is authentication configured"
# Works with git syntax too
llm -A "@staged ensure these changes don't break auth"
How Auto-Mode Works:
- Generates a structural map of your repository using tree-sitter
- Sends the map + your query to an LLM (configurable model)
- LLM returns relevant file paths as JSON
- Files are loaded and included in context
- Shows selected files:
reviewed: main.go, auth.go, session.go
Binary File Handling
Binary files are automatically detected and replaced with [Binary File] placeholders to avoid sending garbage to the LLM.
Debugging & Diagnostics
System Check Verify installation health, config paths, and FTS5 support:
llm doctor
Performance Metrics Show Time-To-First-Token (TTFT) and generation speed (TPS):
llm --vt "count to 100"
Dry Run Preview prompt assembly, token estimation, and API parameters without making a network request:
llm --dry -f src/main.go "explain this"
Configuration
Create ~/.llmterm.yaml for model profiles with inheritance:
default: grok
piped_input_wrapper: "data"
models:
# Base configuration for OpenRouter
_openrouter:
api_base: https://openrouter.ai/api/v1
extra_body:
include_reasoning: true
stream_options:
include_usage: true
# Model aliases that extend base configs
grok:
extend: _openrouter
model: x-ai/grok-4.1-fast
reasoning_effort: low
# Full model names can also be aliases
x-ai/grok-4.1-fast:
extend: grok
codex:
extend: _openrouter
model: gpt-5.1-codex
reasoning_effort: high
local-llama:
model: llama-3-8b-Instruct-q6
api_base: http://localhost:8080/v1
# File context and auto-selection settings
context:
auto_selector_model: "x-ai/grok-4.1-fast" # Model for -A flag (empty = use main model)
max_file_size_kb: 1024 # Skip files larger than this
max_repo_files: 1000 # Max files in repo map
ignored_dirs: [".git", "node_modules", "dist", "vendor", "__pycache__"]
max_file_size_kb: 1024 # Limit for non-image files (default 1 MB)
max_image_size_kb: 10240 # Limit for images (default 10 MB)
debug_truncate_files: 10 # Truncate file context in debug output
Fallback Configuration
You can configure global fallback models and precise error handling policies for resiliency.
Global Fallback Models:
If a specific model configuration does not define its own fallback list, these models will be tried in order.
fallback_models:
- grok
- gpt-4o-mini
Fallback Settings: Control which error codes trigger a fallback attempt and which ones abort immediately.
fallback_settings:
default: deny
allow:
- "427" # Token limit / Header too large
- "429" # Rate limit
- "5*" # Server errors
deny:
- "400" # Bad Request
- "401" # Unauthorized
Use with -m <profile>. CLI flags override config values.
Config inheritance:
- Use
extend: <parent>to inherit from another profile - Child values override parent values
extra_bodymaps are deep-merged- Create aliases for full model names (like
x-ai/grok-4.1-fast)
Profile parameters:
model: actual model name sent to APIapi_base,api_key: endpoint configurationreasoning_effort: none, low, medium, high, xhighreasoning_max_tokens: integer token budgetreasoning_exclude: exclude reasoning from responseverbosity: low, medium, highcontext_order: prepend, append (for clipboard)extra_body: arbitrary JSON fields for the API request
Top-level parameters:
default: default model profile to usepiped_input_wrapper: wrapper tag for piped stdin (default: "context", empty string disables)
Context configuration (context):
auto_selector_model: model to use for-Aauto-selection (empty = use main model)max_file_size_kb: maximum file size to load for non-image files (default: 1024 KB)max_image_size_kb: maximum file size to load for image files (default: 10240 KB)max_repo_files: maximum number of files to include in repo map (default: 1000)ignored_dirs: directories to skip when generating repo mapsdebug_truncate_files: number of lines to show per file in debug output (default: 10)
Compatibility
OpenAI-compatible endpoints only. Tested and working with:
- llama.cpp server
- tabbyAPI
- Groq API
- OpenRouter (with reasoning token support)
Documentation
¶
There is no documentation for this package.