pdscan
Scan your data stores for unencrypted personal data (PII)
- Last names
- Email addresses
- IP addresses
- Street addresses (US)
- Phone numbers (US)
- Credit card numbers
- Social security numbers
- Dates of birth
- Location data
- OAuth tokens
Uses data sampling and naming, and works with compressed files
💥 Zero runtime dependencies and minimal database load
Installation
Download the latest version
Unzip and follow the instructions for your data store
Data Stores
Data Stores
Files
./pdscan file://path/to/file.txt
You can also specify a directory.
./pdscan file://path/to/directory
For absolute paths, use file:///
.
MySQL & MariaDB
./pdscan mysql://user:pass@host:3306/dbname
Postgres
./pdscan postgres://user:pass@host:5432/dbname
If your connection doesn’t use SSL, append to the URI:
?sslmode=disable
For best sampling, enable the tsm_system_rows extension (ships with Postgres 9.5+).
CREATE EXTENSION tsm_system_rows;
SQLite
./pdscan sqlite:/path/to/dbname.sqlite3
S3
./pdscan s3://bucket/path/to/file.txt
Requires s3:GetObject
permission
You can also specify a prefix by ending with a /
.
./pdscan s3://bucket/path/to/directory/
Requires s3:ListBucket
and s3:GetObject
permissions
Others
Feel free to submit a PR
Options
Show data found
./pdscan --show-data
Show low confidence matches
./pdscan --show-all
Change sample size
./pdscan --sample-size 50000
Specify number of processes to use (defaults to 1)
./pdscan --processes 4
Roadmap
- Add more data stores (SQL Server, MongoDB, Elasticsearch, Memcached, Redis)
- Improve rules
- Highlight matches
- Add more output formats, like JSON and CSV
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
To get started with development and testing:
git clone https://github.com/ankane/pdscan.git
cd pdscan
dep ensure
make test