Vypher CLI
Vypher is a command-line tool designed to scan directories for Personally Identifiable Information (PII) and Protected Health Information (PHI) within files. It focuses on identifying sensitive data related to finance and healthcare.
Installation
Via go install (requires Go 1.20+):
go install github.com/vypher-io/cli@latest
Or build from source:
git clone https://github.com/vypher-io/cli.git
cd cli
go build -o vypher
macOS via Homebrew:
brew install vypher-io/tap/vypher
Windows via Scoop:
scoop bucket add vypher-io https://github.com/vypher-io/scoop-bucket
scoop install vypher
Docker (no install required):
docker run --rm -v "$(pwd)":/scan pseudocoding/vypher scan --target /scan
Testing
To run the unit tests for the project:
go test ./...
For verbose output:
go test -v ./...
To run tests with race condition detection:
go test -race ./...
Usage
Scanning a Directory
To scan the current directory:
./vypher scan
To scan a specific directory:
./vypher scan --target /path/to/scan
Vypher supports multiple output formats: console (default), JSON, and SARIF.
Console Output:
./vypher scan -t ./src
Findings are reported with the file path and line number for each match:
File: /path/to/src/users.go
- [Email] jo****om (Line: 42)
- [SSN] 12****89 (Line: 87)
JSON Output:
./vypher scan -t ./src -o json
SARIF Output (for GitHub Security, VS Code integration):
./vypher scan -t ./src -o sarif
Excluding Files
Use the --exclude flag with glob patterns to skip files or directories:
./vypher scan -t ./src --exclude "*_test.go" --exclude "*.log"
Use --rules to scan only for specific rule categories:
./vypher scan -t ./src --rules finance,phi
Available tags:
| Tag |
Patterns Included |
finance |
Credit Card, SSN, IBAN, Bitcoin, Ethereum, Solana |
crypto |
Bitcoin, Ethereum, Solana |
pii |
Credit Card, SSN, Email, Phone, DOB |
healthcare |
MRN, ICD-10 |
phi |
MRN, DOB, ICD-10 |
communication |
Email, Phone |
government |
SSN |
Examples:
# Scan for finance-only patterns (cards, SSN, IBAN, crypto wallets)
./vypher scan -t ./src --rules finance
# Scan for crypto wallet addresses only
./vypher scan -t ./src --rules crypto
# Scan for healthcare data only (MRN, ICD-10, DOB)
./vypher scan -t ./src --rules healthcare,phi
# Scan for general PII (emails, phones, SSNs, cards, DOB)
./vypher scan -t ./src --rules pii
# Combine multiple tags
./vypher scan -t ./src --rules finance,healthcare -o sarif --fail-on-match
Limiting Scan Depth
Use --max-depth to limit how deep the scanner recurses:
./vypher scan -t ./src --max-depth 3
CI/CD Integration
Use --fail-on-match to exit with code 1 when issues are found:
./vypher scan -t ./src --fail-on-match
This is useful for enforcing compliance in CI/CD pipelines.
Configuration File
Create a .vypher.yaml file to define default scan settings:
exclude:
- "*_test.go"
- "*.log"
rules:
- finance
- phi
output: sarif
max_depth: 5
fail_on_match: true
Load it with the --config flag:
./vypher scan --config .vypher.yaml -t ./src
CLI flags always override config file values.
Default Ignored Directories
By default, Vypher ignores the following directories:
.git
node_modules
vendor
.venv
__pycache__
dist
build
.next
.nuxt
out
Default Ignored Files
The following files are skipped by default because they contain checksums, hashes, or other generated content that produce false positives:
package-lock.json
yarn.lock
pnpm-lock.yaml
bun.lockb
go.sum
Cargo.lock
composer.lock
Gemfile.lock
poetry.lock
*.lock
Detected Patterns
Vypher ships with 11 built-in detection patterns:
| # |
Pattern |
Description |
Tags |
Validation |
| 1 |
Credit Card |
13-16 digit card numbers |
finance, pii |
Luhn ✓, Proximity |
| 2 |
SSN |
US Social Security Numbers (XXX-XX-XXXX) |
finance, pii, government |
Proximity |
| 3 |
Email |
Email addresses |
pii, communication |
N/A |
| 4 |
Phone |
US/International phone numbers |
pii, communication |
N/A |
| 5 |
IBAN |
International Bank Account Numbers |
finance |
N/A |
| 6 |
MRN |
Medical Record Numbers (6-12 digits) |
healthcare, phi |
N/A |
| 7 |
DOB |
Date of Birth near keywords |
pii, phi |
N/A |
| 8 |
ICD-10 |
ICD-10 medical diagnosis codes |
healthcare, phi |
N/A |
| 9 |
Bitcoin |
P2PKH, P2SH, Bech32 wallet addresses |
finance, crypto |
Proximity |
| 10 |
Ethereum |
0x-prefixed 40 hex char addresses |
finance, crypto |
Proximity |
| 11 |
Solana |
Base58 32-44 char wallet addresses |
finance, crypto |
Proximity |
Credit Card Validation
Credit card numbers detected by regex are validated using the Luhn algorithm to reduce false positives, as recommended by PCI DSS.
Keyword Proximity
SSN and Credit Card matches are annotated with a keyword proximity indicator when relevant keywords (e.g., "ssn", "social", "credit", "card") are found within ±50 characters of the match. This helps distinguish high-confidence detections from potential false positives.
Vypher uses parallel file scanning with a worker pool automatically sized to the number of CPU cores. File collection (directory walk) is sequential, while file reading and pattern matching run concurrently for maximum throughput on large codebases.
Disclaimer
This tool uses regex-based pattern matching and may produce false positives. It is intended as an aid for developers and security professionals, not as a guaranteed solution for compliance.