sensitive

package module
v0.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 8, 2026 License: MIT Imports: 5 Imported by: 0

README

sensitive

Go Reference Go Report Card Coverage MultiPlatformUnitTest

logo

sensitive is a Go library that detects sensitive data in text. It scans for credit card numbers, email addresses, Japanese phone numbers, Japanese My Number, JWTs, AWS access keys, IBANs, IP addresses, Bitcoin addresses, and Ethereum addresses, returning the position, type, and confidence level of each match. It also includes international and fintech-focused detectors such as SWIFT/BIC, US ABA routing numbers, UK sort codes, payment tokens, card CVV/expiry, and ACH trace numbers. Masking is available as an optional helper, but detection is the core focus.

The library has zero external dependencies and relies only on the Go standard library.

Requirements

  • Go Version: 1.24 or later
  • Operating Systems (tested on):
    • Linux
    • macOS
    • Windows

Installation

go get github.com/nao1215/sensitive

Quick Start

Create a Scanner, choose which detectors to enable, call ScanString, and optionally mask findings:

package main

import (
    "fmt"

    "github.com/nao1215/sensitive"
    "github.com/nao1215/sensitive/detector"
    "github.com/nao1215/sensitive/mask"
)

func main() {
    scanner := sensitive.NewScanner(sensitive.WithAll())
    text := "user tanaka@example.com paid with 4532015112830366"
    findings := scanner.ScanString(text)

    for _, f := range findings {
        fmt.Printf("type=%s raw=%s confidence=%.2f\n",
            f.DetectorName, f.RawValue, f.Confidence)
    }

    masked := mask.Mask(text, findings, map[sensitive.DetectorName]mask.Strategy{
        detector.NamePAN:   mask.Last4,
        detector.NameEmail: mask.Partial,
    })
    fmt.Println(masked)
}

Output (order may vary):

type=pan raw=4532015112830366 confidence=1.00
type=email raw=tanaka@example.com confidence=1.00
user t*****@example.com paid with ************0366

WithAll() turns on every built-in detector. If you only care about specific types, pick them individually:

scanner := sensitive.NewScanner(sensitive.WithPAN(), sensitive.WithEmail())

Caution on WithAll(): WithAll() enables all built-in detectors, including context-based weak detectors (WithBankAccount, WithACHTrace, WithMerchantID, WithCVV, WithCardExpiry). These detectors rely on nearby keywords rather than checksums and may produce false positives. In strict/financial-audit scenarios where false positive cost is high, avoid WithAll() and enable only the specific detectors you need.

Note: NewScanner() with no options creates a scanner with zero detectors, so Scan will always return an empty result. You must pass at least one With*() option to enable detection.

Common mistakes:

// Mistake 1: No detectors — always returns empty results.
scanner := sensitive.NewScanner()
findings := scanner.ScanString("4532015112830366") // findings is empty!

// Mistake 2: WithAll() in strict mode produces noise from weak detectors.
// Use specific options instead.
scanner = sensitive.NewScanner(sensitive.WithPAN(), sensitive.WithEmail())

Supported Detectors

Option Detects Validation
WithPAN() Credit card numbers (Visa, Mastercard, Amex, JCB, Discover, Diners, UnionPay) BIN prefix + Luhn algorithm
WithEmail() Email addresses Structure + known TLD check
WithJPPhone() Japanese phone numbers (mobile, landline, IP phone, toll-free, M2M/IoT, service) Prefix classification + digit count
WithMyNumber() Japanese My Number (12-digit individual number) MOD 11 check digit
WithJWT() JSON Web Tokens Header decode + alg key check
WithAWSKey() AWS Access Key IDs (AKIA... / ASIA...) Prefix + 20-char alphanumeric
WithIBAN() International Bank Account Numbers Country code + MOD 97 check digit
WithIPAddr() IPv4 and IPv6 addresses net.ParseIP + octet range
WithSWIFTBIC() SWIFT/BIC codes Format + country code validation
WithABARouting() US ABA routing numbers Prefix range + checksum
WithUKSortCode() UK sort codes (XX-XX-XX) Pattern + boundary checks
WithCVV() Card verification values (CVV/CVC/CID) Context keyword + digit length (context-based, weaker)
WithCardExpiry() Card expiration dates Context keyword + MM/YY validation (context-based, weaker)
WithPaymentToken() Payment processor tokens (Stripe/PayPal/Square) Prefix + minimum body length
WithBankAccount() Bank account numbers (context-based) Context keyword + digit range (context-based, weaker)
WithACHTrace() ACH trace numbers Context keyword + prefix range (context-based, weaker)
WithMerchantID() Merchant/terminal IDs Context keyword + format (context-based, weaker)
WithBTC() Bitcoin addresses (P2PKH, P2SH, Bech32, Bech32m/Taproot) Base58Check (double SHA-256) / Bech32 polynomial checksum
WithETH() Ethereum addresses (0x + 40 hex) EIP-55 mixed-case checksum (Keccak-256)
WithAll() All of the above

Benchmarks

Measurement conditions:

  • Command: go test -bench BenchmarkScanner -benchmem -benchtime 3s -count 5 -run '^$'
  • Go version: 1.24 (linux/amd64)
  • GOMAXPROCS: 16
  • CPU: AMD RYZEN AI MAX+ 395 w/ Radeon 8060S
  • Commit: b7e0cdc

To reproduce, run the command above. Use -count 5 and take the median for stable results. Benchmark numbers are environment-sensitive. Expect variation across Go versions, CPUs, and background load, and refresh results periodically if you publish them for compliance or audit purposes.

Per-detector benchmarks (single detector enabled)
Benchmark ns/op B/op allocs/op
PAN 286.7 944 16
Email 188.2 288 9
JPPhone 171.3 464 8
MyNumber 142.0 392 6
JWT 1001 1208 25
AWSKey 147.1 280 8
IBAN 205.7 226 6
IPAddr 209.8 312 10
SWIFTBIC 176.1 288 9
ABARouting 132.7 376 6
UKSortCode 128.4 248 8
CVV 289.6 568 18
CardExpiry 261.4 456 16
PaymentToken 276.7 688 20
BankAccount 435.1 760 22
ACHTrace 325.9 480 17
MerchantID 343.4 568 18
BTC 514.5 328 7
ETH 2118 329 7
Multi-detector and edge-case benchmarks
Benchmark Description
BenchmarkScannerNoMatch All detectors enabled, input with no sensitive data. Note: detectors with nil hints (IBAN, SWIFT/BIC, ABA, MyNumber) always run regardless of input content.
BenchmarkScannerAllDetectors All detectors enabled, input containing email + PAN + IP
BenchmarkScannerEmptyInput All detectors enabled, nil input
BenchmarkScannerLargeInput All detectors enabled, ~4KB log block with no sensitive data
BenchmarkScannerHintMatchNoDetection All detectors enabled, hints match but no valid sensitive data found
BenchmarkScannerFullWidthInput All detectors enabled, full-width digit input requiring normalization

Scanning Streams

For log files and other line-oriented input, use ScanLines to process data incrementally without loading the entire content into memory. The callback is invoked only for lines that contain findings:

f, _ := os.Open("access.log")
defer f.Close()

scanner := sensitive.NewScanner(sensitive.WithAll())
err := scanner.ScanLines(f, func(lineNum int, line []byte, findings []sensitive.Finding) {
    for _, finding := range findings {
        fmt.Printf("line %d: %s (%s)\n", lineNum, finding.DetectorName, finding.RawValue)
    }
})
if err != nil {
    log.Fatal(err)
}

If the entire content fits in memory, ScanReader is a simpler alternative:

f, _ := os.Open("data.txt")
defer f.Close()

findings, err := scanner.ScanReader(f)

Confidence Filtering

Use WithMinConfidence to control the strictness of detection. Findings below the threshold are filtered out:

// Strict mode: only high-confidence findings (>= 0.8).
scanner := sensitive.NewScanner(sensitive.WithAll(), sensitive.WithMinConfidence(0.8))

// Loose mode: include medium-confidence and above (>= 0.4).
scanner = sensitive.NewScanner(sensitive.WithAll(), sensitive.WithMinConfidence(0.4))

This is useful for suppressing noise from context-based weak detectors (BankAccount, CVV, CardExpiry, etc.) while keeping strong checksum-validated results.

Classifying Findings by Kind

Each finding has a Kind() method that returns a broad semantic category (financial, pii, or credential), enabling downstream classification without switching on all detector names:

for _, f := range findings {
    switch f.Kind() {
    case detector.KindFinancial:
        // PAN, IBAN, ABA routing, sort code, CVV, card expiry, etc.
    case detector.KindPII:
        // email, phone, My Number, IP address
    case detector.KindCredential:
        // JWT, AWS key, payment token
    }
}

Working with Findings

Each Finding contains the detector name, byte offsets, confidence score (0.0--1.0), the raw matched string, and a Detail struct with detector-specific information.

Note: Start and End are byte offsets, not rune (character) offsets. For multi-byte UTF-8 text (e.g., Japanese), use the byte positions directly when slicing []byte data.

Context-based detectors (WithBankAccount, WithACHTrace, WithMerchantID, WithCVV, WithCardExpiry) rely on nearby keywords rather than checksums, so they are more prone to false positives than checksum-validated detectors. Confidence scores vary by detector: WithBankAccount returns 0.50--0.65, WithMerchantID and WithACHTrace return 0.70--0.75, and WithCVV and WithCardExpiry return 0.85.

Checking the detector type
for _, f := range findings {
    if f.IsPAN() {
        // handle credit card
    }
    if f.IsEmail() {
        // handle email
    }
}

There is also a generic Is method that takes a detector name constant:

if f.Is(detector.NamePAN) { ... }
Confidence levels

Confidence is a float between 0.0 and 1.0. When you do not need the exact score, use Level() to get a categorical assessment:

switch f.Level() {
case detector.ConfidenceHigh:   // >= 0.8
case detector.ConfidenceMedium: // >= 0.4
case detector.ConfidenceLow:    // < 0.4
}
Getting detector-specific details

Every finding carries a Detail field. Instead of type-asserting it yourself, use the typed accessor methods. Each returns a pointer and a boolean indicating success:

scanner := sensitive.NewScanner(sensitive.WithPAN())
findings := scanner.ScanString("4532015112830366")

if detail, ok := findings[0].PANDetail(); ok {
    fmt.Println(detail.Brand)  // "Visa"
    fmt.Println(detail.Last4)  // "0366"
    fmt.Println(detail.Luhn)   // true
}

The available accessors and their fields:

Method Fields
PANDetail() Brand, BIN, Last4, Luhn, Length
EmailDetail() Local, Domain
JPPhoneDetail() PhoneType (JPPhoneTypeMobile, JPPhoneTypeLandline, JPPhoneTypeIPPhone, JPPhoneTypeTollFree, JPPhoneTypeM2M, JPPhoneTypeService)
JWTDetail() Algorithm (e.g. HS256, RS256)
AWSKeyDetail() KeyType (AWSKeyTypeLongTerm or AWSKeyTypeTemporary)
IBANDetail() CountryCode (ISO 3166-1 alpha-2)
IPAddrDetail() Version (4 or 6)
MyNumberDetail() CheckDigitValid
BTCDetail() AddressType (BTCAddressP2PKH, BTCAddressP2SH, BTCAddressBech32, BTCAddressBech32m)
ETHDetail() EIP55 (bool, whether EIP-55 checksum validated)

Masking

The mask sub-package provides five masking strategies:

Strategy Example
Redact 4532015112830366 -> ****************
Last4 4532015112830366 -> ************0366
First1Last4 4532015112830366 -> 4***********0366
Partial tanaka@example.com -> t*****@example.com
Hash 4532015112830366 -> a8f5f167 (SHA-256 prefix)

Use mask.Mask to apply different strategies per detector:

import (
    "github.com/nao1215/sensitive"
    "github.com/nao1215/sensitive/detector"
    "github.com/nao1215/sensitive/mask"
)

scanner := sensitive.NewScanner(sensitive.WithPAN(), sensitive.WithEmail())
text := "user tanaka@example.com paid with 4532015112830366"
findings := scanner.ScanString(text)

masked := mask.Mask(text, findings, map[sensitive.DetectorName]mask.Strategy{
    detector.NamePAN:   mask.Last4,
    detector.NameEmail: mask.Partial,
})

fmt.Println(masked)
// user t*****@example.com paid with ************0366

If you want the same strategy for everything, use mask.MaskAll:

masked := mask.MaskAll(text, findings, mask.Redact)
// user ****************** paid with ****************

Custom Detectors

You can add your own detectors. The simplest way is detector.NewRegex, which wraps a compiled regular expression:

import (
    "regexp"

    "github.com/nao1215/sensitive"
    "github.com/nao1215/sensitive/detector"
)

ticketDetector := detector.NewRegex(
    detector.DetectorName("ticket_id"),
    regexp.MustCompile(`TICKET-\d{4}`),
    [][]byte{[]byte("TICKET-")},   // hint for pre-filtering
    0.9,                            // fixed confidence
)

scanner := sensitive.NewScanner(
    sensitive.WithPAN(),
    sensitive.WithDetector(ticketDetector),
)

The hints parameter is important for performance. The scanner uses bytes.Contains to check hints before calling Scan, so a good hint lets the scanner skip the regex entirely for inputs that cannot match.

For more complex logic, implement the Detector interface directly:

type Detector interface {
    Name() detector.DetectorName
    Hints() [][]byte
    Scan(data []byte) []detector.Finding
}

Full-Width Digit Support

Japanese text often uses full-width digits (0-9). Detectors that parse digit sequences directly (PAN, JPPhone, MyNumber, ABA routing, BankAccount) normalize full-width digits to half-width before detection, so a phone number written as 090-1234-5678 or a bank account number written as 口座番号 12345678 is correctly recognized. IBAN and UK sort code do not normalize full-width digits because their formats are primarily used in Western contexts where full-width encoding is uncommon. Context-based detectors (CVV, CardExpiry, ACHTrace, MerchantID) also do not normalize full-width digits. The utility function is also available for direct use:

normalized, posMap := detector.NormalizeFullWidthDigits([]byte("090-1234-5678"))
fmt.Println(string(normalized)) // 090-1234-5678

How It Works

The scanner runs a multi-stage filtering pipeline to keep scan cost low.

sequenceDiagram
    participant Caller
    participant Scanner
    participant HintFilter as Hint Filter
    participant Detector
    participant Dedup as Dedup & Sort

    Caller->>Scanner: Scan(data)
    alt input is empty
        Scanner-->>Caller: nil
    end

    loop for each registered Detector
        Scanner->>HintFilter: bytes.Contains(data, hint) (~15 ns, SIMD)
        alt no hint matched
            HintFilter-->>Scanner: skip
        else hint matched
            HintFilter-->>Scanner: pass
            Scanner->>Detector: Scan(data)
            Note right of Detector: domain-specific validation<br/>(BIN, Luhn, MOD 97, etc.)
            Detector-->>Scanner: []Finding
        end
    end

    Scanner->>Dedup: merge all findings
    Note right of Dedup: dedup overlapping (keep highest confidence)<br/>sort by confidence desc
    Dedup-->>Scanner: []Finding
    Scanner-->>Caller: []Finding

Contributing

Contributions are welcome!

If you would like to send comments such as "find a bug" or "request for additional features" to the developer, please use one of the following contacts.

License

MIT LICENSE

Documentation

Overview

Package sensitive provides a high-performance, rule-based sensitive data detection library for Go. It scans text for credit card numbers (PAN), email addresses, phone numbers, Japanese My Number, and other confidential information, returning the position, type, and confidence level of each finding.

Detection is the core focus. Masking is optional and provided as a thin helper in the mask sub-package. Users can implement their own masking logic using the detection results.

Architecture

The library uses a multi-stage filtering pipeline to minimize scan cost:

  1. Hint-based pre-filter: Each Detector provides hint byte sequences. bytes.Contains (SIMD-optimized in Go runtime) quickly eliminates lines that cannot possibly contain a match (~15ns per line).
  2. Detector.Scan: Only called on data that passes the hint filter. Uses dedicated parsers and domain rule validation (BIN check, Luhn, check digits, etc.) rather than regular expressions.
  3. Result merging: Overlapping findings are deduplicated and sorted by confidence.

Usage

scanner := sensitive.NewScanner(sensitive.WithAll())
findings := scanner.ScanString("card is 4532-0151-1283-0366")
for _, f := range findings {
    fmt.Printf("Found %s at [%d:%d] confidence=%.2f\n",
        f.DetectorName, f.Start, f.End, f.Confidence)
}
Example (MaskingPipeline)

This example demonstrates how to use the mask package with the scanner to detect and mask sensitive data in a single pipeline.

package main

import (
	"fmt"

	"github.com/nao1215/sensitive"
	"github.com/nao1215/sensitive/mask"
)

func main() {
	scanner := sensitive.NewScanner(sensitive.WithPAN(), sensitive.WithEmail())
	text := "user tanaka@example.com paid with 4532015112830366"
	findings := scanner.ScanString(text)

	masked := mask.Mask(text, findings, map[sensitive.DetectorName]mask.Strategy{
		detector.NamePAN:   mask.Last4,
		detector.NameEmail: mask.Partial,
	})

	fmt.Println(masked)
}
Output:

user t*****@example.com paid with ************0366

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Detector

type Detector = detector.Detector

Detector is the interface that each sensitive data detector must implement. This is a type alias for detector.Detector.

type DetectorName

type DetectorName = detector.DetectorName

DetectorName is the type-safe identifier for a detector. This is a type alias for detector.DetectorName.

type Finding

type Finding = detector.Finding

Finding represents a single instance of detected sensitive data. This is a type alias for detector.Finding.

type Option

type Option func(*Scanner)

Option is a function that configures a Scanner. Options are passed to NewScanner to enable specific detectors.

func WithABARouting

func WithABARouting() Option

WithABARouting enables US ABA routing transit number detection. ABA routing numbers are 9-digit identifiers with a checksum used for ACH transfers, wire transfers, and check processing.

func WithACHTrace

func WithACHTrace() Option

WithACHTrace enables ACH trace number detection. ACH trace numbers are 15-digit identifiers used to track ACH transactions. Detection requires context keywords (e.g., "ACH", "trace") nearby.

func WithAWSKey

func WithAWSKey() Option

WithAWSKey enables AWS Access Key ID detection. AWS keys always start with "AKIA" (long-term) or "ASIA" (temporary STS).

func WithAll

func WithAll() Option

WithAll enables all built-in detectors. This is a convenience option equivalent to enabling each detector individually.

Example
package main

import (
	"fmt"

	"github.com/nao1215/sensitive"
)

func main() {
	scanner := sensitive.NewScanner(sensitive.WithAll())
	text := "user tanaka@example.com paid with 4532015112830366 from 192.168.1.1"
	findings := scanner.ScanString(text)

	fmt.Printf("found %d sensitive item(s)\n", len(findings))
}
Output:

found 3 sensitive item(s)

func WithBTC

func WithBTC() Option

WithBTC enables Bitcoin address detection. BTC detection supports P2PKH (prefix '1'), P2SH (prefix '3'), Bech32 SegWit v0 (prefix 'bc1q'), and Bech32m Taproot v1 (prefix 'bc1p'). Addresses are validated using Base58Check (double SHA-256) or Bech32/Bech32m polynomial checksums.

func WithBankAccount

func WithBankAccount() Option

WithBankAccount enables bank account number detection (context-based). This is a weak detector that looks for digit sequences near banking keywords (e.g., "口座番号", "bank account"). Confidence is intentionally lower (0.5-0.65) because bank account numbers have no universal format or checksum.

func WithCVV

func WithCVV() Option

WithCVV enables CVV/CVC/CID detection. Card verification values are 3-4 digit security codes on payment cards. Detection requires context keywords (e.g., "CVV", "security code") nearby.

func WithCardExpiry

func WithCardExpiry() Option

WithCardExpiry enables payment card expiration date detection. Expiry dates are formatted as MM/YY or MM/YYYY. Detection requires context keywords (e.g., "exp", "expiry", "有効期限") nearby.

func WithDetector

func WithDetector(d Detector) Option

WithDetector adds a custom Detector to the Scanner. This allows users to extend the Scanner with their own detection logic. If d is nil, the option is a no-op (the nil detector is silently ignored).

customDetector := detector.NewRegex(
    "internal_id",
    regexp.MustCompile(`PROJ-\d{6}`),
    [][]byte{[]byte("PROJ-")},
    0.8,
)
scanner := sensitive.NewScanner(sensitive.WithDetector(customDetector))
Example
package main

import (
	"fmt"
	"regexp"

	"github.com/nao1215/sensitive"
	"github.com/nao1215/sensitive/detector"
)

func main() {
	// Register a custom detector for internal project IDs.
	projectID := detector.NewRegex(
		detector.DetectorName("project_id"),
		regexp.MustCompile(`PROJ-\d{6}`),
		[][]byte{[]byte("PROJ-")},
		0.8,
	)

	scanner := sensitive.NewScanner(sensitive.WithDetector(projectID))
	findings := scanner.ScanString("assigned to PROJ-123456")

	for _, f := range findings {
		fmt.Printf("type=%s raw=%s\n", f.DetectorName, f.RawValue)
	}
}
Output:

type=project_id raw=PROJ-123456

func WithETH

func WithETH() Option

WithETH enables Ethereum address detection. ETH detection recognizes 42-character addresses (0x + 40 hex chars). Mixed-case addresses are validated against the EIP-55 checksum using Keccak-256.

func WithEmail

func WithEmail() Option

WithEmail enables email address detection. Email detection uses '@' as a pivot point and scans forward/backward to identify the local part and domain, then validates the structure.

func WithIBAN

func WithIBAN() Option

WithIBAN enables International Bank Account Number detection. IBAN detection validates the country code, length, and MOD 97 check digit.

func WithIPAddr

func WithIPAddr() Option

WithIPAddr enables IP address (IPv4 and IPv6) detection.

func WithJPPhone

func WithJPPhone() Option

WithJPPhone enables Japanese phone number detection. It recognizes landline (03-xxxx-xxxx), mobile (090-xxxx-xxxx), IP phone (050-xxxx-xxxx), and toll-free (0120-xxx-xxx) formats.

func WithJWT

func WithJWT() Option

WithJWT enables JSON Web Token detection. JWT detection looks for the characteristic "eyJ" prefix (base64 of "{") and validates the three-part structure (header.payload.signature).

func WithMerchantID

func WithMerchantID() Option

WithMerchantID enables merchant ID and terminal ID detection. MIDs are typically 15 alphanumeric characters and TIDs are 8 digits. Detection requires context keywords (e.g., "merchant ID", "TID") nearby.

func WithMinConfidence

func WithMinConfidence(threshold float64) Option

WithMinConfidence sets the minimum confidence threshold for reported findings. Findings with confidence below the threshold are filtered out after detection and deduplication. This allows callers to select a strict mode (e.g., 0.8 for high-confidence only) or a loose mode (e.g., 0.4 to include medium-confidence matches).

A value of 0 (the default) disables filtering and returns all findings.

// Strict mode: only high-confidence findings.
scanner := sensitive.NewScanner(sensitive.WithAll(), sensitive.WithMinConfidence(0.8))

// Loose mode: include medium-confidence and above.
scanner := sensitive.NewScanner(sensitive.WithAll(), sensitive.WithMinConfidence(0.4))
Example
package main

import (
	"fmt"

	"github.com/nao1215/sensitive"
)

func main() {
	// WithMinConfidence filters out findings below the threshold.
	// BankAccount detections have lower confidence (0.50-0.65),
	// while Email detections have high confidence (1.00).
	scanner := sensitive.NewScanner(
		sensitive.WithEmail(),
		sensitive.WithBankAccount(),
		sensitive.WithMinConfidence(0.8),
	)
	findings := scanner.ScanString("user tanaka@example.com bank account 12345678")

	for _, f := range findings {
		fmt.Printf("type=%s confidence=%.2f\n", f.DetectorName, f.Confidence)
	}
}
Output:

type=email confidence=1.00

func WithMyNumber

func WithMyNumber() Option

WithMyNumber enables Japanese My Number (individual number) detection. My Number is a 12-digit number with a check digit. The detector validates the check digit algorithm to reduce false positives.

func WithPAN

func WithPAN() Option

WithPAN enables credit card number (PAN) detection. PAN detection uses BIN prefix matching and the Luhn algorithm to validate detected numbers, providing high-confidence results.

func WithPaymentToken

func WithPaymentToken() Option

WithPaymentToken enables payment processor token detection. Detects API tokens from Stripe (sk_live_, pk_live_, tok_, etc.), PayPal (PAYID-), and Square (sq0idp-, sq0csp-).

func WithSWIFTBIC

func WithSWIFTBIC() Option

WithSWIFTBIC enables SWIFT/BIC code detection. SWIFT/BIC codes are 8 or 11 character identifiers used for international wire transfers. Detection validates the format and ISO 3166-1 country code.

func WithSortByPosition

func WithSortByPosition() Option

WithSortByPosition configures the Scanner to return findings sorted by their byte offset (Start position, ascending) instead of the default confidence-descending order. This is useful when the caller needs to process findings in the order they appear in the original text.

func WithUKSortCode

func WithUKSortCode() Option

WithUKSortCode enables UK bank sort code detection. Sort codes are 6-digit numbers in XX-XX-XX format that identify bank branches.

func WithoutDedup

func WithoutDedup() Option

WithoutDedup disables the default deduplication of overlapping findings. By default, when two findings overlap in byte position, only the one with the highest confidence is kept. With this option, all findings are returned, which is useful when the caller needs to see every detection from every detector, even if they overlap.

scanner := sensitive.NewScanner(sensitive.WithAll(), sensitive.WithoutDedup())

type Scanner

type Scanner struct {
	// contains filtered or unexported fields
}

Scanner scans text for sensitive data using registered Detectors. It implements a multi-stage filtering pipeline:

  1. Empty data check: Immediately returns if the input is empty.
  2. Hint-based pre-filter: Uses bytes.Contains with each Detector's hints to quickly skip detectors that cannot match the input. ASCII letters in hints are matched case-insensitively. Hints must be exhaustive for the detector's domain (i.e., every possible match must contain at least one hint byte sequence), or empty/nil to always run the detector. Non-exhaustive hints will cause silent detection misses.
  3. Detector.Scan: Runs only the detectors whose hints matched.
  4. Result merging: By default, deduplicates overlapping findings (keeping the highest confidence) and sorts by confidence (descending). Use WithSortByPosition to sort by byte offset instead, or WithoutDedup to keep all findings including overlapping ones.

Create a Scanner using NewScanner with the desired options.

func NewScanner

func NewScanner(opts ...Option) *Scanner

NewScanner creates a new Scanner with the given options. Each option enables a specific detector or adds a custom one. If the same detector is registered more than once (e.g., by combining WithAll with an individual option like WithPAN), the duplicate is silently removed so that each detector runs at most once.

// Enable all built-in detectors
scanner := sensitive.NewScanner(sensitive.WithAll())

// Enable only PAN and email detection
scanner := sensitive.NewScanner(sensitive.WithPAN(), sensitive.WithEmail())
Example
package main

import (
	"fmt"

	"github.com/nao1215/sensitive"
)

func main() {
	scanner := sensitive.NewScanner(sensitive.WithPAN(), sensitive.WithEmail())
	findings := scanner.ScanString("user tanaka@example.com paid with 4532015112830366")

	for _, f := range findings {
		fmt.Printf("type=%s raw=%s confidence=%.2f\n", f.DetectorName, f.RawValue, f.Confidence)
	}
}
Output:

type=pan raw=4532015112830366 confidence=1.00
type=email raw=tanaka@example.com confidence=1.00
Example (WithAll)
package main

import (
	"fmt"

	"github.com/nao1215/sensitive"
)

func main() {
	scanner := sensitive.NewScanner(sensitive.WithAll())
	findings := scanner.ScanString("key: AKIAIOSFODNN7EXAMPLE")

	for _, f := range findings {
		fmt.Printf("type=%s raw=%s confidence=%.2f\n", f.DetectorName, f.RawValue, f.Confidence)
	}
}
Output:

type=awskey raw=AKIAIOSFODNN7EXAMPLE confidence=0.95

func (*Scanner) Scan

func (s *Scanner) Scan(data []byte) []Finding

Scan examines the given byte slice for sensitive data and returns all findings. The multi-stage filtering pipeline ensures that detectors are only invoked when their hint sequences are found in the data, minimizing scan cost.

By default the returned findings are deduplicated (overlapping findings are merged, keeping the highest confidence) and sorted by confidence in descending order. Findings with the same confidence are ordered by byte offset (ascending), then by detector name for full determinism. Use WithSortByPosition to sort by byte offset (ascending) instead. Use WithoutDedup to receive all findings including overlapping ones from different detectors.

Example
package main

import (
	"fmt"

	"github.com/nao1215/sensitive"
)

func main() {
	scanner := sensitive.NewScanner(sensitive.WithPAN())
	data := []byte("payment for card 4532-0151-1283-0366 amount $99.99")
	findings := scanner.Scan(data)

	for _, f := range findings {
		fmt.Printf("found %s at position [%d:%d]\n", f.DetectorName, f.Start, f.End)
	}
}
Output:

found pan at position [17:36]

func (*Scanner) ScanLines

func (s *Scanner) ScanLines(r io.Reader, fn func(lineNum int, line []byte, findings []Finding)) error

ScanLines reads from r line by line and calls fn for each line that contains at least one finding. This is the recommended API for scanning log files and other line-oriented text streams, as it processes data incrementally without loading the entire input into memory.

lineNum is 1-based. The line parameter is the raw line bytes (without the trailing newline). findings contains all detections for that line.

fn is only called for lines that contain findings. Lines with no sensitive data are silently skipped.

Returns the first error encountered while reading from r, or nil if the entire input was processed successfully.

f, _ := os.Open("access.log")
defer f.Close()
err := scanner.ScanLines(f, func(lineNum int, line []byte, findings []Finding) {
    fmt.Printf("line %d: found %d sensitive values\n", lineNum, len(findings))
})
Example
package main

import (
	"fmt"
	"strings"

	"github.com/nao1215/sensitive"
)

func main() {
	scanner := sensitive.NewScanner(sensitive.WithEmail())
	input := "normal log line\nuser tanaka@example.com logged in\nanother safe line\n"

	err := scanner.ScanLines(strings.NewReader(input), func(lineNum int, _ []byte, findings []sensitive.Finding) {
		for _, f := range findings {
			fmt.Printf("line %d: %s=%s\n", lineNum, f.DetectorName, f.RawValue)
		}
	})
	if err != nil {
		fmt.Println("error:", err)
	}
}
Output:

line 2: email=tanaka@example.com

func (*Scanner) ScanReader

func (s *Scanner) ScanReader(r io.Reader) ([]Finding, error)

ScanReader reads all data from r and scans it for sensitive data. This is a convenience method for cases where the full content fits in memory. For large inputs or streaming use cases, prefer Scanner.ScanLines.

f, _ := os.Open("access.log")
defer f.Close()
findings, err := scanner.ScanReader(f)
Example
package main

import (
	"fmt"
	"strings"

	"github.com/nao1215/sensitive"
)

func main() {
	scanner := sensitive.NewScanner(sensitive.WithPAN(), sensitive.WithEmail())
	r := strings.NewReader("user tanaka@example.com paid with 4532015112830366")

	findings, err := scanner.ScanReader(r)
	if err != nil {
		fmt.Println("error:", err)
		return
	}

	for _, f := range findings {
		fmt.Printf("type=%s raw=%s\n", f.DetectorName, f.RawValue)
	}
}
Output:

type=pan raw=4532015112830366
type=email raw=tanaka@example.com

func (*Scanner) ScanString

func (s *Scanner) ScanString(text string) []Finding

ScanString is a convenience method that scans a string for sensitive data. It converts the string to a byte slice and calls Scanner.Scan.

Example
package main

import (
	"fmt"

	"github.com/nao1215/sensitive"
)

func main() {
	scanner := sensitive.NewScanner(sensitive.WithEmail())
	findings := scanner.ScanString("contact admin@example.com for support")

	fmt.Printf("found %d email(s)\n", len(findings))
	if len(findings) > 0 {
		fmt.Printf("email: %s\n", findings[0].RawValue)
	}
}
Output:

found 1 email(s)
email: admin@example.com

type SensitiveKind

type SensitiveKind = detector.SensitiveKind

SensitiveKind categorizes a finding into a broad semantic group (financial, PII, credential). This is a type alias for detector.SensitiveKind.

Directories

Path Synopsis
Package detector provides individual sensitive data detector implementations for the sensitive library.
Package detector provides individual sensitive data detector implementations for the sensitive library.
Package mask provides optional masking helpers for the sensitive library.
Package mask provides optional masking helpers for the sensitive library.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL