Documentation
¶
Overview ¶
Package sensitive provides a high-performance, rule-based sensitive data detection library for Go. It scans text for credit card numbers (PAN), email addresses, phone numbers, Japanese My Number, and other confidential information, returning the position, type, and confidence level of each finding.
Detection is the core focus. Masking is optional and provided as a thin helper in the mask sub-package. Users can implement their own masking logic using the detection results.
Architecture ¶
The library uses a multi-stage filtering pipeline to minimize scan cost:
- Hint-based pre-filter: Each Detector provides hint byte sequences. bytes.Contains (SIMD-optimized in Go runtime) quickly eliminates lines that cannot possibly contain a match (~15ns per line).
- Detector.Scan: Only called on data that passes the hint filter. Uses dedicated parsers and domain rule validation (BIN check, Luhn, check digits, etc.) rather than regular expressions.
- Result merging: Overlapping findings are deduplicated and sorted by confidence.
Usage ¶
scanner := sensitive.NewScanner(sensitive.WithAll())
findings := scanner.ScanString("card is 4532-0151-1283-0366")
for _, f := range findings {
fmt.Printf("Found %s at [%d:%d] confidence=%.2f\n",
f.DetectorName, f.Start, f.End, f.Confidence)
}
Example (MaskingPipeline) ¶
This example demonstrates how to use the mask package with the scanner to detect and mask sensitive data in a single pipeline.
package main
import (
"fmt"
"github.com/nao1215/sensitive"
"github.com/nao1215/sensitive/mask"
)
func main() {
scanner := sensitive.NewScanner(sensitive.WithPAN(), sensitive.WithEmail())
text := "user tanaka@example.com paid with 4532015112830366"
findings := scanner.ScanString(text)
masked := mask.Mask(text, findings, map[sensitive.DetectorName]mask.Strategy{
detector.NamePAN: mask.Last4,
detector.NameEmail: mask.Partial,
})
fmt.Println(masked)
}
Output: user t*****@example.com paid with ************0366
Index ¶
- type Detector
- type DetectorName
- type Finding
- type Option
- func WithABARouting() Option
- func WithACHTrace() Option
- func WithAWSKey() Option
- func WithAll() Option
- func WithBTC() Option
- func WithBankAccount() Option
- func WithCVV() Option
- func WithCardExpiry() Option
- func WithDetector(d Detector) Option
- func WithETH() Option
- func WithEmail() Option
- func WithIBAN() Option
- func WithIPAddr() Option
- func WithJPPhone() Option
- func WithJWT() Option
- func WithMerchantID() Option
- func WithMinConfidence(threshold float64) Option
- func WithMyNumber() Option
- func WithPAN() Option
- func WithPaymentToken() Option
- func WithSWIFTBIC() Option
- func WithSortByPosition() Option
- func WithUKSortCode() Option
- func WithoutDedup() Option
- type Scanner
- type SensitiveKind
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Detector ¶
Detector is the interface that each sensitive data detector must implement. This is a type alias for detector.Detector.
type DetectorName ¶
type DetectorName = detector.DetectorName
DetectorName is the type-safe identifier for a detector. This is a type alias for detector.DetectorName.
type Finding ¶
Finding represents a single instance of detected sensitive data. This is a type alias for detector.Finding.
type Option ¶
type Option func(*Scanner)
Option is a function that configures a Scanner. Options are passed to NewScanner to enable specific detectors.
func WithABARouting ¶
func WithABARouting() Option
WithABARouting enables US ABA routing transit number detection. ABA routing numbers are 9-digit identifiers with a checksum used for ACH transfers, wire transfers, and check processing.
func WithACHTrace ¶
func WithACHTrace() Option
WithACHTrace enables ACH trace number detection. ACH trace numbers are 15-digit identifiers used to track ACH transactions. Detection requires context keywords (e.g., "ACH", "trace") nearby.
func WithAWSKey ¶
func WithAWSKey() Option
WithAWSKey enables AWS Access Key ID detection. AWS keys always start with "AKIA" (long-term) or "ASIA" (temporary STS).
func WithAll ¶
func WithAll() Option
WithAll enables all built-in detectors. This is a convenience option equivalent to enabling each detector individually.
Example ¶
package main
import (
"fmt"
"github.com/nao1215/sensitive"
)
func main() {
scanner := sensitive.NewScanner(sensitive.WithAll())
text := "user tanaka@example.com paid with 4532015112830366 from 192.168.1.1"
findings := scanner.ScanString(text)
fmt.Printf("found %d sensitive item(s)\n", len(findings))
}
Output: found 3 sensitive item(s)
func WithBTC ¶
func WithBTC() Option
WithBTC enables Bitcoin address detection. BTC detection supports P2PKH (prefix '1'), P2SH (prefix '3'), Bech32 SegWit v0 (prefix 'bc1q'), and Bech32m Taproot v1 (prefix 'bc1p'). Addresses are validated using Base58Check (double SHA-256) or Bech32/Bech32m polynomial checksums.
func WithBankAccount ¶
func WithBankAccount() Option
WithBankAccount enables bank account number detection (context-based). This is a weak detector that looks for digit sequences near banking keywords (e.g., "口座番号", "bank account"). Confidence is intentionally lower (0.5-0.65) because bank account numbers have no universal format or checksum.
func WithCVV ¶
func WithCVV() Option
WithCVV enables CVV/CVC/CID detection. Card verification values are 3-4 digit security codes on payment cards. Detection requires context keywords (e.g., "CVV", "security code") nearby.
func WithCardExpiry ¶
func WithCardExpiry() Option
WithCardExpiry enables payment card expiration date detection. Expiry dates are formatted as MM/YY or MM/YYYY. Detection requires context keywords (e.g., "exp", "expiry", "有効期限") nearby.
func WithDetector ¶
WithDetector adds a custom Detector to the Scanner. This allows users to extend the Scanner with their own detection logic. If d is nil, the option is a no-op (the nil detector is silently ignored).
customDetector := detector.NewRegex(
"internal_id",
regexp.MustCompile(`PROJ-\d{6}`),
[][]byte{[]byte("PROJ-")},
0.8,
)
scanner := sensitive.NewScanner(sensitive.WithDetector(customDetector))
Example ¶
package main
import (
"fmt"
"regexp"
"github.com/nao1215/sensitive"
"github.com/nao1215/sensitive/detector"
)
func main() {
// Register a custom detector for internal project IDs.
projectID := detector.NewRegex(
detector.DetectorName("project_id"),
regexp.MustCompile(`PROJ-\d{6}`),
[][]byte{[]byte("PROJ-")},
0.8,
)
scanner := sensitive.NewScanner(sensitive.WithDetector(projectID))
findings := scanner.ScanString("assigned to PROJ-123456")
for _, f := range findings {
fmt.Printf("type=%s raw=%s\n", f.DetectorName, f.RawValue)
}
}
Output: type=project_id raw=PROJ-123456
func WithETH ¶
func WithETH() Option
WithETH enables Ethereum address detection. ETH detection recognizes 42-character addresses (0x + 40 hex chars). Mixed-case addresses are validated against the EIP-55 checksum using Keccak-256.
func WithEmail ¶
func WithEmail() Option
WithEmail enables email address detection. Email detection uses '@' as a pivot point and scans forward/backward to identify the local part and domain, then validates the structure.
func WithIBAN ¶
func WithIBAN() Option
WithIBAN enables International Bank Account Number detection. IBAN detection validates the country code, length, and MOD 97 check digit.
func WithJPPhone ¶
func WithJPPhone() Option
WithJPPhone enables Japanese phone number detection. It recognizes landline (03-xxxx-xxxx), mobile (090-xxxx-xxxx), IP phone (050-xxxx-xxxx), and toll-free (0120-xxx-xxx) formats.
func WithJWT ¶
func WithJWT() Option
WithJWT enables JSON Web Token detection. JWT detection looks for the characteristic "eyJ" prefix (base64 of "{") and validates the three-part structure (header.payload.signature).
func WithMerchantID ¶
func WithMerchantID() Option
WithMerchantID enables merchant ID and terminal ID detection. MIDs are typically 15 alphanumeric characters and TIDs are 8 digits. Detection requires context keywords (e.g., "merchant ID", "TID") nearby.
func WithMinConfidence ¶
WithMinConfidence sets the minimum confidence threshold for reported findings. Findings with confidence below the threshold are filtered out after detection and deduplication. This allows callers to select a strict mode (e.g., 0.8 for high-confidence only) or a loose mode (e.g., 0.4 to include medium-confidence matches).
A value of 0 (the default) disables filtering and returns all findings.
// Strict mode: only high-confidence findings. scanner := sensitive.NewScanner(sensitive.WithAll(), sensitive.WithMinConfidence(0.8)) // Loose mode: include medium-confidence and above. scanner := sensitive.NewScanner(sensitive.WithAll(), sensitive.WithMinConfidence(0.4))
Example ¶
package main
import (
"fmt"
"github.com/nao1215/sensitive"
)
func main() {
// WithMinConfidence filters out findings below the threshold.
// BankAccount detections have lower confidence (0.50-0.65),
// while Email detections have high confidence (1.00).
scanner := sensitive.NewScanner(
sensitive.WithEmail(),
sensitive.WithBankAccount(),
sensitive.WithMinConfidence(0.8),
)
findings := scanner.ScanString("user tanaka@example.com bank account 12345678")
for _, f := range findings {
fmt.Printf("type=%s confidence=%.2f\n", f.DetectorName, f.Confidence)
}
}
Output: type=email confidence=1.00
func WithMyNumber ¶
func WithMyNumber() Option
WithMyNumber enables Japanese My Number (individual number) detection. My Number is a 12-digit number with a check digit. The detector validates the check digit algorithm to reduce false positives.
func WithPAN ¶
func WithPAN() Option
WithPAN enables credit card number (PAN) detection. PAN detection uses BIN prefix matching and the Luhn algorithm to validate detected numbers, providing high-confidence results.
func WithPaymentToken ¶
func WithPaymentToken() Option
WithPaymentToken enables payment processor token detection. Detects API tokens from Stripe (sk_live_, pk_live_, tok_, etc.), PayPal (PAYID-), and Square (sq0idp-, sq0csp-).
func WithSWIFTBIC ¶
func WithSWIFTBIC() Option
WithSWIFTBIC enables SWIFT/BIC code detection. SWIFT/BIC codes are 8 or 11 character identifiers used for international wire transfers. Detection validates the format and ISO 3166-1 country code.
func WithSortByPosition ¶
func WithSortByPosition() Option
WithSortByPosition configures the Scanner to return findings sorted by their byte offset (Start position, ascending) instead of the default confidence-descending order. This is useful when the caller needs to process findings in the order they appear in the original text.
func WithUKSortCode ¶
func WithUKSortCode() Option
WithUKSortCode enables UK bank sort code detection. Sort codes are 6-digit numbers in XX-XX-XX format that identify bank branches.
func WithoutDedup ¶
func WithoutDedup() Option
WithoutDedup disables the default deduplication of overlapping findings. By default, when two findings overlap in byte position, only the one with the highest confidence is kept. With this option, all findings are returned, which is useful when the caller needs to see every detection from every detector, even if they overlap.
scanner := sensitive.NewScanner(sensitive.WithAll(), sensitive.WithoutDedup())
type Scanner ¶
type Scanner struct {
// contains filtered or unexported fields
}
Scanner scans text for sensitive data using registered Detectors. It implements a multi-stage filtering pipeline:
- Empty data check: Immediately returns if the input is empty.
- Hint-based pre-filter: Uses bytes.Contains with each Detector's hints to quickly skip detectors that cannot match the input. ASCII letters in hints are matched case-insensitively. Hints must be exhaustive for the detector's domain (i.e., every possible match must contain at least one hint byte sequence), or empty/nil to always run the detector. Non-exhaustive hints will cause silent detection misses.
- Detector.Scan: Runs only the detectors whose hints matched.
- Result merging: By default, deduplicates overlapping findings (keeping the highest confidence) and sorts by confidence (descending). Use WithSortByPosition to sort by byte offset instead, or WithoutDedup to keep all findings including overlapping ones.
Create a Scanner using NewScanner with the desired options.
func NewScanner ¶
NewScanner creates a new Scanner with the given options. Each option enables a specific detector or adds a custom one. If the same detector is registered more than once (e.g., by combining WithAll with an individual option like WithPAN), the duplicate is silently removed so that each detector runs at most once.
// Enable all built-in detectors scanner := sensitive.NewScanner(sensitive.WithAll()) // Enable only PAN and email detection scanner := sensitive.NewScanner(sensitive.WithPAN(), sensitive.WithEmail())
Example ¶
package main
import (
"fmt"
"github.com/nao1215/sensitive"
)
func main() {
scanner := sensitive.NewScanner(sensitive.WithPAN(), sensitive.WithEmail())
findings := scanner.ScanString("user tanaka@example.com paid with 4532015112830366")
for _, f := range findings {
fmt.Printf("type=%s raw=%s confidence=%.2f\n", f.DetectorName, f.RawValue, f.Confidence)
}
}
Output: type=pan raw=4532015112830366 confidence=1.00 type=email raw=tanaka@example.com confidence=1.00
Example (WithAll) ¶
package main
import (
"fmt"
"github.com/nao1215/sensitive"
)
func main() {
scanner := sensitive.NewScanner(sensitive.WithAll())
findings := scanner.ScanString("key: AKIAIOSFODNN7EXAMPLE")
for _, f := range findings {
fmt.Printf("type=%s raw=%s confidence=%.2f\n", f.DetectorName, f.RawValue, f.Confidence)
}
}
Output: type=awskey raw=AKIAIOSFODNN7EXAMPLE confidence=0.95
func (*Scanner) Scan ¶
Scan examines the given byte slice for sensitive data and returns all findings. The multi-stage filtering pipeline ensures that detectors are only invoked when their hint sequences are found in the data, minimizing scan cost.
By default the returned findings are deduplicated (overlapping findings are merged, keeping the highest confidence) and sorted by confidence in descending order. Findings with the same confidence are ordered by byte offset (ascending), then by detector name for full determinism. Use WithSortByPosition to sort by byte offset (ascending) instead. Use WithoutDedup to receive all findings including overlapping ones from different detectors.
Example ¶
package main
import (
"fmt"
"github.com/nao1215/sensitive"
)
func main() {
scanner := sensitive.NewScanner(sensitive.WithPAN())
data := []byte("payment for card 4532-0151-1283-0366 amount $99.99")
findings := scanner.Scan(data)
for _, f := range findings {
fmt.Printf("found %s at position [%d:%d]\n", f.DetectorName, f.Start, f.End)
}
}
Output: found pan at position [17:36]
func (*Scanner) ScanLines ¶
func (s *Scanner) ScanLines(r io.Reader, fn func(lineNum int, line []byte, findings []Finding)) error
ScanLines reads from r line by line and calls fn for each line that contains at least one finding. This is the recommended API for scanning log files and other line-oriented text streams, as it processes data incrementally without loading the entire input into memory.
lineNum is 1-based. The line parameter is the raw line bytes (without the trailing newline). findings contains all detections for that line.
fn is only called for lines that contain findings. Lines with no sensitive data are silently skipped.
Returns the first error encountered while reading from r, or nil if the entire input was processed successfully.
f, _ := os.Open("access.log")
defer f.Close()
err := scanner.ScanLines(f, func(lineNum int, line []byte, findings []Finding) {
fmt.Printf("line %d: found %d sensitive values\n", lineNum, len(findings))
})
Example ¶
package main
import (
"fmt"
"strings"
"github.com/nao1215/sensitive"
)
func main() {
scanner := sensitive.NewScanner(sensitive.WithEmail())
input := "normal log line\nuser tanaka@example.com logged in\nanother safe line\n"
err := scanner.ScanLines(strings.NewReader(input), func(lineNum int, _ []byte, findings []sensitive.Finding) {
for _, f := range findings {
fmt.Printf("line %d: %s=%s\n", lineNum, f.DetectorName, f.RawValue)
}
})
if err != nil {
fmt.Println("error:", err)
}
}
Output: line 2: email=tanaka@example.com
func (*Scanner) ScanReader ¶
ScanReader reads all data from r and scans it for sensitive data. This is a convenience method for cases where the full content fits in memory. For large inputs or streaming use cases, prefer Scanner.ScanLines.
f, _ := os.Open("access.log")
defer f.Close()
findings, err := scanner.ScanReader(f)
Example ¶
package main
import (
"fmt"
"strings"
"github.com/nao1215/sensitive"
)
func main() {
scanner := sensitive.NewScanner(sensitive.WithPAN(), sensitive.WithEmail())
r := strings.NewReader("user tanaka@example.com paid with 4532015112830366")
findings, err := scanner.ScanReader(r)
if err != nil {
fmt.Println("error:", err)
return
}
for _, f := range findings {
fmt.Printf("type=%s raw=%s\n", f.DetectorName, f.RawValue)
}
}
Output: type=pan raw=4532015112830366 type=email raw=tanaka@example.com
func (*Scanner) ScanString ¶
ScanString is a convenience method that scans a string for sensitive data. It converts the string to a byte slice and calls Scanner.Scan.
Example ¶
package main
import (
"fmt"
"github.com/nao1215/sensitive"
)
func main() {
scanner := sensitive.NewScanner(sensitive.WithEmail())
findings := scanner.ScanString("contact admin@example.com for support")
fmt.Printf("found %d email(s)\n", len(findings))
if len(findings) > 0 {
fmt.Printf("email: %s\n", findings[0].RawValue)
}
}
Output: found 1 email(s) email: admin@example.com
type SensitiveKind ¶
type SensitiveKind = detector.SensitiveKind
SensitiveKind categorizes a finding into a broad semantic group (financial, PII, credential). This is a type alias for detector.SensitiveKind.
Directories
¶
| Path | Synopsis |
|---|---|
|
Package detector provides individual sensitive data detector implementations for the sensitive library.
|
Package detector provides individual sensitive data detector implementations for the sensitive library. |
|
Package mask provides optional masking helpers for the sensitive library.
|
Package mask provides optional masking helpers for the sensitive library. |
