entropy

package module
v0.0.0-...-83d8433 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 19, 2018 License: MIT Imports: 5 Imported by: 0

README

Entropy

Build Status Coverage Status Go Report Card GoDoc

Really, a character N-gram entropy modeller

This learns a n-gram model on a set of strings, and then can predict the entropy of other strings.

For example, it has been noted that (good) passwords have high entropy, and we should be able to use that fact to find (good) passwords in code (where they shouldn't be).

To build the executable (make the output directory anything you want)

go build -o bin/string_entropy cmd/string_entropy/string_entropy.go

You can do a similar build for create_google_books_ngram_model.go

To train:

  • Get some (source) code to train on, and train on it.

The following trains on the 1.7.3 Go distribution code, after removing some crypto files, as well as test files.

The resulting model can be found in the data directory.

find /usr/local/Cellar/go/1.7.3/libexec/src/ | grep "\.go" | grep -v "crypto" | grep -v "_test" | xargs cat > /tmp/go_text
bin/password_entropy -train -in /tmp/go_text -model data/go-3.tsv -ngram_size 3

To predict:

  • Use the model to predict on some source code, for example, the source for this program, which has some high-entropy passwords in it, looking at lines at least 10 characters long (after compressing spaces)
cat src/cmd/string_entropy/string_entropy.go |  bin/string_entropy -predict -model data/go-3.tsv -min 10  | sort -g | head -5
-16.095489	-997.920341	62	 // magic_password := "PXKXoyThngGrjCgBLuf2ivrpFFNKA9UgBHrxpLaW"
-14.334451	-1576.789572	110	 outf.Write([]byte(fmt.Sprintf("%f\t%f\t%v\t%s\n", p.LogProbAverage, p.LogProbTotal, p.NumberOfNGrams, p.Text)))
-14.186484	-113.491869	8	 modf = f2
-14.186484	-113.491869	8	 modf = f2
-14.107883	-211.618242	15	 model.Dump(modf)

Columns are, for each line: average log probability (take negative for entropy), total log probability, number of ngrams, and the line.

The Sccanf line reminds me that format strings always look line line noise, and now we have the science to prove it!

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Model

type Model struct {
	Size    int
	Counter *NGramCounter
}

Model contains a max size and a map from its to character models

func New

func New(MaxNGramSize int) (model *Model)

New creates a Model with maximum ngram size of `MaxNGramSize`

func Read

func Read(f io.Reader) (model *Model)

Read reads a model in

func (*Model) Dump

func (model *Model) Dump(f io.Writer)

Dump sends a set of ngram models to a writer

func (*Model) Entropy

func (model *Model) Entropy(text string) (entropy float64)

func (*Model) LogProb

func (model *Model) LogProb(key string) (logProb float64)

LogProb returns the best matching log probability for a key given a set of models

func (*Model) Predict

func (model *Model) Predict(text string) (prediction *Prediction)

Predict returns a prediction for a string

func (*Model) Train

func (model *Model) Train(f io.Reader) (exampleCount int)

Train trains a set of ngram models from a file. Models must be initialized. returns the number of example lines used

func (*Model) TrainWithMultiplier

func (model *Model) TrainWithMultiplier(f io.Reader) (exampleCount int)

TrainWithMultiplier trains a set of ngram models from a file. Models must be initialized. returns the number of example lines used. format is token <tab> count

func (*Model) Update

func (model *Model) Update(line string)

Update for Models send string to each counter

func (*Model) UpdateWithMultiplier

func (model *Model) UpdateWithMultiplier(line string, multiplier uint64)

UpdateWithMultiplier for Models send string to each counter with multiplier

type NGramCounter

type NGramCounter struct {
	Size   int
	Counts map[string]uint64
	Total  uint64
}

NGramCounter contains counts and totals for Ngrams of a particular size

func NewNGramCounter

func NewNGramCounter(maxNGramSize int) (counter *NGramCounter)

NewNGramCounter returns a new ngram counter

func (*NGramCounter) Count

func (counter *NGramCounter) Count(key string, ifNotFound uint64) (count uint64)

Count returns the number of ngrams in a particular counter. returns default if not found

func (*NGramCounter) Update

func (counter *NGramCounter) Update(line string)

Update updates the counter for a newly seen string

func (*NGramCounter) UpdateWithMultiplier

func (counter *NGramCounter) UpdateWithMultiplier(line string, multiplier uint64)

UpdateWithMultiplier updates the counter for a string, using a multiplier

type Prediction

type Prediction struct {
	LogProbAverage float64
	LogProbTotal   float64
	NumberOfNGrams int
	Text           string
}

Prediction is the log probability of a string and other data

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL