anomalia

package module
v0.0.0-...-f210c62 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 8, 2019 License: Apache-2.0 Imports: 12 Imported by: 0

README

anomalia

anomalia is a lightweight Go library for Time Series data analysis.

It supports anomaly detection and correlation. The API is simple and configurable in the sense that you can choose which algorithm suits your needs for anomaly detection and correlation.

🚧 The library is currently under development so things might move or change!

Installation

Installation is done using go get:

go get -u github.com/project-anomalia/anomalia

Supported Go Versions

anomalia supports Go >= 1.10.

Documentation

Quick Start Guide

This is a simple example to get you up & running with the library:

package main

import (
    "fmt"
    "github.com/project-anomalia/anomalia"
)

func main() {
    // Load the time series from an external source.
    // It returns an instance of TimeSeries struct which holds the timestamps and their values.
    timeSeries := anomalia.NewTimeSeriesFromCSV("testdata/co2.csv")

    // Instantiate the default detector which uses a threshold to determines anomalies.
    // Anomalies are data points that have a score above the threshold (2.5 in this case).
    detector := anomalia.NewDetector(timeSeries).Threshold(2.5)

    // Calculate the scores for each data point in the time series
    scores := detector.GetScores()

    // Find anomalies based the calculated scores
    anomalies := detector.GetAnomalies(scores)

    // Iterate over detected anomalies and print their exact timestamp and value.
    for _, anomaly := range anomalies {
        fmt.Println(anomaly.Timestamp, ",", anomaly.Value)
    }
}

The example above uses some preset algorithms to calculate the scores. It might not be suited for your case but you can use any of the available algorithms.

All algorithms follow a straightforward design so you could get the scores based on your configuration and understanding of the data, and pass those scores to Detector.GetAnomalies(*ScoreList) function.

And another example to check if two time series have a relationship or correlated:

package main

import "github.com/project-anomalia/anomalia"

func main() {
    a := anomalia.NewTimeSeriesFromCSV("testdata/co2.csv")
    b := anomalia.NewTimeSeriesFromCSV("testdata/airline-passengers.csv")

    // If the time series data points do not follow a certain distribution,
    // we use the Spearman correlator.
    coefficient := anomalia.NewCorrelator(a, b).CorrelationMethod(anomalia.SpearmanRank, nil).Run()

    // If the coefficient is above a certain threshold (0.7 for example), we consider
    // the time series correlated.
    if coefficient < 0.7 {
        panic("no relationship between the two time series")
    }
}

If the correlation algorithm accepts any additional parameters (see different implementations), you can pass them as a float64 slice to the CorrelationMethod(method, options) method.

Roadmap

  • CLI tool for rapid experimentation
  • Benchmarks

Resources

TODO

License

Copyright 2019 Faissal Elamraoui

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func AbsInt

func AbsInt(x int) int

AbsInt returns the absolute value of an integer.

func Average

func Average(input []float64) float64

Average returns the average of the input

func Cdf

func Cdf(mean, stdev float64) func(float64) float64

Cdf returns the cumulative distribution function

func Ema

func Ema(input []float64, smoothingFactor float64) []float64

Ema returns the exponnential moving average of the input

func Erf

func Erf(x float64) float64

Erf is the guassian error function

func Float64WithPrecision

func Float64WithPrecision(num float64, precision int) float64

Float64WithPrecision rounds float to certain precision

func Pdf

func Pdf(mean, stdev float64) func(float64) float64

Pdf returns the probability density function

func RandomSineValue

func RandomSineValue(rand *rand.Rand, limit int) float64

RandomSineValue returns sine of value between [0, limit] using a rand source

func RoundFloat

func RoundFloat(num float64) int

RoundFloat rounds float to closest int

func Stdev

func Stdev(input []float64) float64

Stdev returns the standard deviation of the input

func SumFloat64s

func SumFloat64s(input []float64) float64

SumFloat64s returns the sum of all float64 in the input

func SumInts

func SumInts(input []int) int

SumInts returns the sum of all integers in the input.

func Variance

func Variance(input []float64) (variance float64)

Variance returns the variance of the input

Types

type AbsoluteThreshold

type AbsoluteThreshold struct {
	// contains filtered or unexported fields
}

AbsoluteThreshold holds absolute threshold algorithm configuration. It takes the difference of lower and upper thresholds with the current value as anomaly score.

func NewAbsoluteThreshold

func NewAbsoluteThreshold() *AbsoluteThreshold

NewAbsoluteThreshold returns AbsoluteThAbsoluteThreshold instance.

func (*AbsoluteThreshold) Run

func (at *AbsoluteThreshold) Run(timeSeries *TimeSeries) *ScoreList

Run runs the absolute threshold algorithm over the time series.

func (*AbsoluteThreshold) Thresholds

func (at *AbsoluteThreshold) Thresholds(lower, upper float64) Algorithm

Thresholds sets both lower and upper thresholds.

type Algorithm

type Algorithm interface {
	Run(*TimeSeries) *ScoreList
	// contains filtered or unexported methods
}

Algorithm is the base interface of all algorithms

type Anomaly

type Anomaly struct {
	Timestamp      float64
	StartTimestamp float64
	EndTimestamp   float64
	Score          float64
	Value          float64
	Severity       string
	// contains filtered or unexported fields
}

Anomaly holds information about the detected anomaly/outlier

func (*Anomaly) GetTimeWindow

func (anomaly *Anomaly) GetTimeWindow() (float64, float64)

GetTimeWindow returns anomaly start and end timestamps

func (*Anomaly) GetTimestampedScore

func (anomaly *Anomaly) GetTimestampedScore() (float64, float64)

GetTimestampedScore returns anomaly exact timestamp with calculated score

type Bitmap

type Bitmap struct {
	// contains filtered or unexported fields
}

Bitmap holds bitmap algorithm configuration.

The Bitmap algorithm breaks the time series into chunks and uses the frequency of similar chunks to determine anomalies scores. The scoring happens by sliding both lagging and future windows.

func NewBitmap

func NewBitmap() *Bitmap

NewBitmap returns Bitmap instance.

func (*Bitmap) ChunkSize

func (b *Bitmap) ChunkSize(size int) *Bitmap

ChunkSize sets the chunk size to use (defaults to 2).

func (*Bitmap) FutureWindowSize

func (b *Bitmap) FutureWindowSize(size int) *Bitmap

FutureWindowSize sets the future window size (default to 0).

func (*Bitmap) LagWindowSize

func (b *Bitmap) LagWindowSize(size int) *Bitmap

LagWindowSize sets the lag window size (defaults to 0).

func (*Bitmap) Precision

func (b *Bitmap) Precision(p int) *Bitmap

Precision sets the precision.

func (*Bitmap) Run

func (b *Bitmap) Run(timeSeries *TimeSeries) *ScoreList

Run runs the bitmap algorithm over the time series

type BitmapBinary

type BitmapBinary string

BitmapBinary wrapper type around a string with custom behaviour

func (BitmapBinary) At

func (bb BitmapBinary) At(index int) BitmapBinary

At returns character string at the specified index

func (BitmapBinary) Len

func (bb BitmapBinary) Len() int

Len returns the length of the underlying string

func (BitmapBinary) Slice

func (bb BitmapBinary) Slice(lower, upper int) BitmapBinary

Slice slices a string in a Python-ish way When lower == upper, it returns an empty string When lower < 0 and upper < len(binary), it return an empty string When lower < 0 and upper >= len(binary), it returns the first character When lower >= 0 and upper >= len(binary), it slices the string from lower till end of string

func (BitmapBinary) String

func (bb BitmapBinary) String() string

String returns the underlying string

type CorrelationAlgorithm

type CorrelationAlgorithm interface {
	Run() float64
	// contains filtered or unexported methods
}

CorrelationAlgorithm base interface for correlation algorithms.

type CorrelationMethod

type CorrelationMethod int32

CorrelationMethod type checker for correlation method

const (
	// XCorr represents the Cross Correlation algorithm.
	XCorr CorrelationMethod = iota
	// SpearmanRank represents the Spearman Rank Correlation algorithm.
	SpearmanRank
	// Pearson represents the Pearson Correlation algorithm.
	Pearson
)

type CorrelationResult

type CorrelationResult struct {
	Shift              float64
	Coefficient        float64
	ShiftedCoefficient float64
}

CorrelationResult holds detected correlation result.

type Correlator

type Correlator struct {
	// contains filtered or unexported fields
}

Correlator holds the correlator configuration.

func NewCorrelator

func NewCorrelator(current, target *TimeSeries) *Correlator

NewCorrelator returns an instance of the correlation algorithm.

func (*Correlator) CorrelationMethod

func (c *Correlator) CorrelationMethod(method CorrelationMethod, options []float64) *Correlator

CorrelationMethod specifies which correlation method to use (XCross or SpearmanRank).

func (*Correlator) Run

func (c *Correlator) Run() float64

Run runs the correlator.

func (*Correlator) TimePeriod

func (c *Correlator) TimePeriod(start, end float64) *Correlator

TimePeriod crops the current and target time series to specified range.

func (*Correlator) UseAnomalyScore

func (c *Correlator) UseAnomalyScore(use bool) *Correlator

UseAnomalyScore tells the correlator to calculate anomaly scores from both time series.

type CrossCorrelation

type CrossCorrelation struct {
	// contains filtered or unexported fields
}

CrossCorrelation holds Cross Correlation algorithm parameters and settings. It is calculated by multiplying and summing the current and target time series together.

This implementation uses normalized time series which makes scoring easy to understand:

  • The higher the coefficient, the higher the correlation is.
  • The maximum value of the correlation coefficient is 1.
  • The minimum value of the correlation coefficient is -1.
  • Two time series are exactly the same when their correlation coefficient is equal to 1.

func NewCrossCorrelation

func NewCrossCorrelation(current *TimeSeries, target *TimeSeries) *CrossCorrelation

NewCrossCorrelation returns an instance of the cross correlation struct.

func (*CrossCorrelation) GetCorrelationResult

func (cc *CrossCorrelation) GetCorrelationResult() CorrelationResult

GetCorrelationResult runs the cross correlation algorithm.

func (*CrossCorrelation) Impact

func (cc *CrossCorrelation) Impact(impact float64) *CrossCorrelation

Impact sets impact of shift on shifted correlation coefficient.

func (*CrossCorrelation) MaxShift

func (cc *CrossCorrelation) MaxShift(shift float64) *CrossCorrelation

MaxShift sets the maximal shift in seconds.

func (*CrossCorrelation) Run

func (cc *CrossCorrelation) Run() float64

Run runs the cross correlation algorithm and returns only the coefficient.

type Derivative

type Derivative struct {
	// contains filtered or unexported fields
}

Derivative holds the derivative algorithm configuration. It uses the derivative of the current value as anomaly score.

func NewDerivative

func NewDerivative() *Derivative

NewDerivative return Derivative instance

func (*Derivative) Run

func (d *Derivative) Run(timeSeries *TimeSeries) *ScoreList

Run runs the derivative algorithm over the time series

func (*Derivative) SmoothingFactor

func (d *Derivative) SmoothingFactor(factor float64) *Derivative

SmoothingFactor sets the smoothing factor.

type Detector

type Detector struct {
	// contains filtered or unexported fields
}

Detector is the default anomaly detector

func NewDetector

func NewDetector(ts *TimeSeries) *Detector

NewDetector return an instance of the default detector.

func (*Detector) GetAnomalies

func (d *Detector) GetAnomalies(scoreList *ScoreList) []Anomaly

GetAnomalies detects anomalies using the specified threshold on scores

func (*Detector) GetScores

func (d *Detector) GetScores() *ScoreList

GetScores runs the detector on the supplied time series. It uses the Bitmap algorithm to calculate the score list and falls back to the normal distribution algorithm in case of not enough data points in the time series.

func (*Detector) Threshold

func (d *Detector) Threshold(threshold float64) *Detector

Threshold sets the threshold used by the detector.

type ExponentialMovingAverage

type ExponentialMovingAverage struct {
	// contains filtered or unexported fields
}

ExponentialMovingAverage holds the algorithm configuration. It uses the value's deviation from the exponential moving average of a lagging window to determine anomalies scores.

func NewEma

func NewEma() *ExponentialMovingAverage

NewEma returns ExponentialMovingAverage instance

func (*ExponentialMovingAverage) LagWindowSize

func (ema *ExponentialMovingAverage) LagWindowSize(size int) *ExponentialMovingAverage

LagWindowSize sets the lagging window size.

func (*ExponentialMovingAverage) Run

func (ema *ExponentialMovingAverage) Run(timeSeries *TimeSeries) *ScoreList

Run runs the exponential moving average algorithm over the time series

func (*ExponentialMovingAverage) SmoothingFactor

func (ema *ExponentialMovingAverage) SmoothingFactor(factor float64) *ExponentialMovingAverage

SmoothingFactor sets the smoothing factor.

type Iterator

type Iterator struct {
	// contains filtered or unexported fields
}

Iterator wraps a slice of float64 values with the current element position

func NewIterator

func NewIterator(data []float64) *Iterator

NewIterator returns an iterator instance

func (*Iterator) Next

func (it *Iterator) Next() *float64

Next returns next item from the iterator It panics when iterator is exhausted.

type NormalDistribution

type NormalDistribution struct {
	// contains filtered or unexported fields
}

NormalDistribution holds the normal distribution algorithm configuration.

func NewNormalDistribution

func NewNormalDistribution() *NormalDistribution

NewNormalDistribution returns normal distribution instance.

func (*NormalDistribution) EpsilonThreshold

func (nd *NormalDistribution) EpsilonThreshold(threshold float64) *NormalDistribution

EpsilonThreshold sets the Gaussian epsilon threshold.

func (*NormalDistribution) Run

func (nd *NormalDistribution) Run(timeSeries *TimeSeries) *ScoreList

Run runs the normal distribution algorithm over the time series.

type PearsonCorrelation

type PearsonCorrelation struct {
	// contains filtered or unexported fields
}

PearsonCorrelation struct which holds the current and target time series.

func NewPearsonCorrelation

func NewPearsonCorrelation(current, target *TimeSeries) *PearsonCorrelation

NewPearsonCorrelation returns an instance of the pearson correlation struct. It measures the linear correlation between the current and target time series. It should be used when the two time series are normally distributed.

The correlation coefficient always has a value between -1 and +1 where:

  • +1 is total positive linear correlation
  • 0 is no linear correlation
  • −1 is total negative linear correlation

For the used formula, check: https://en.wikipedia.org/wiki/Pearson_correlation_coefficient

func (*PearsonCorrelation) Run

func (pc *PearsonCorrelation) Run() float64

Run runs the pearson correlation on the current and target time series. It returns the correlation coefficient which always has a value between -1 and +1.

type STL

type STL struct {
	// contains filtered or unexported fields
}

STL holds Seasonal-Trend With Loess algorithm configuration.

The STL algorithm decomposes a time series into seasonal, trend and remainder components. The paper describing this algorithm can found here: https://search.proquest.com/openview/cc5001e8a0978a6c029ae9a41af00f21

func NewSTL

func NewSTL() *STL

NewSTL returns an instance of the STL struct.

func (*STL) Iterations

func (s *STL) Iterations(n int) *STL

func (*STL) LowPassFilterConfig

func (s *STL) LowPassFilterConfig(config *stl.Config) *STL

func (*STL) MethodType

func (s *STL) MethodType(method STLMethod) *STL

func (*STL) Periodicity

func (s *STL) Periodicity(p int) *STL

func (*STL) RobustIterations

func (s *STL) RobustIterations(n int) *STL

func (*STL) Run

func (s *STL) Run(timeSeries *TimeSeries) *ScoreList

Run runs the STL algorithm over the time series.

func (*STL) SeasonalConfig

func (s *STL) SeasonalConfig(config *stl.Config) *STL

func (*STL) TrendConfig

func (s *STL) TrendConfig(config *stl.Config) *STL

func (*STL) Width

func (s *STL) Width(w int) *STL

type STLMethod

type STLMethod int32
const (
	// Additive method suggests that the components are added together (linear model).
	Additive STLMethod = iota

	// Multiplicative method suggests that the components are multiplied together (non-linear model).
	Multiplicative
)

type ScoreList

type ScoreList struct {
	Timestamps []float64
	Scores     []float64
}

ScoreList holds timestamps and their scores

func (*ScoreList) Denoise

func (sl *ScoreList) Denoise() *ScoreList

Denoise sets low(noisy) scores to 0.0

func (*ScoreList) Max

func (sl *ScoreList) Max() float64

Max returns the maximum of the scores

func (*ScoreList) Zip

func (sl *ScoreList) Zip() map[float64]float64

Zip convert the score list to map (map[Timestamp]Score)

type SpearmanCorrelation

type SpearmanCorrelation struct {
	// contains filtered or unexported fields
}

SpearmanCorrelation holds the Spearman Correlation algorithm configuration. It is the non-parametric version of the Pearson correlation and it should be used when the time series distribution is unknown or not normally distributed.

Spearman’s correlator returns a value from -1 to 1, where:

  • +1 = a perfect positive correlation between ranks
  • -1 = a perfect negative correlation between ranks
  • 0 = no correlation between ranks.

func NewSpearmanCorrelation

func NewSpearmanCorrelation(current, target *TimeSeries) *SpearmanCorrelation

NewSpearmanCorrelation returns an instance of the spearman correlation struct.

func (*SpearmanCorrelation) Run

func (sc *SpearmanCorrelation) Run() float64

Run runs the spearman correlation on the current and target time series.

type TimePeriod

type TimePeriod struct {
	Start float64
	End   float64
}

TimePeriod represents a time period marked by start and end timestamps.

type TimeSeries

type TimeSeries struct {
	Timestamps []float64
	Values     []float64
}

TimeSeries wrapper for timestamps and their values

func NewTimeSeries

func NewTimeSeries(timestamps []float64, values []float64) *TimeSeries

NewTimeSeries creates a new time series data structure

func NewTimeSeriesFromCSV

func NewTimeSeriesFromCSV(path string) *TimeSeries

NewTimeSeriesFromCSV create a new time series from a CSV file.

func (*TimeSeries) AddOffset

func (ts *TimeSeries) AddOffset(offset float64) *TimeSeries

AddOffset increments time series timestamps by some offset

func (*TimeSeries) Align

func (ts *TimeSeries) Align(other *TimeSeries)

Align aligns two time series so that they have the same dimension and same timestamps

func (*TimeSeries) Average

func (ts *TimeSeries) Average() float64

Average calculates average value over the time series

func (*TimeSeries) Crop

func (ts *TimeSeries) Crop(start, end float64) *TimeSeries

Crop crops the time series timestamps into the specified range [start, end]

func (*TimeSeries) EarliestTimestamp

func (ts *TimeSeries) EarliestTimestamp() float64

EarliestTimestamp returns the earliest timestamp in the time series

func (*TimeSeries) LastestTimestamp

func (ts *TimeSeries) LastestTimestamp() float64

LastestTimestamp returns the latest timestamp in the time series

func (*TimeSeries) Median

func (ts *TimeSeries) Median() float64

Median calculates median value over the time series.

func (*TimeSeries) Normalize

func (ts *TimeSeries) Normalize() *TimeSeries

Normalize normalizes the time series values by dividing by the maximum value

func (*TimeSeries) NormalizeWithMinMax

func (ts *TimeSeries) NormalizeWithMinMax() *TimeSeries

NormalizeWithMinMax normalizes time series values using MixMax

func (*TimeSeries) Size

func (ts *TimeSeries) Size() int

Size returns the time series dimension/size.

func (*TimeSeries) Stdev

func (ts *TimeSeries) Stdev() float64

Stdev calculates the standard deviation of the time series

func (*TimeSeries) String

func (ts *TimeSeries) String() string

String returns JSON representation of the time series

func (*TimeSeries) Zip

func (ts *TimeSeries) Zip() map[float64]float64

Zip convert the time series to a map (map[Timestamp]Value)

type WeightedSum

type WeightedSum struct {
	*ExponentialMovingAverage
	*Derivative
	// contains filtered or unexported fields
}

WeightedSum holds the weighted sum algorithm configuration.

The weighted sum algorithm uses a weighted sum to calculate anomalies scores. It should be used ONLY on small data-sets.

func NewWeightedSum

func NewWeightedSum() *WeightedSum

NewWeightedSum returns weighted sum instance

func (*WeightedSum) MinEmaScore

func (ws *WeightedSum) MinEmaScore(value float64) *WeightedSum

MinEmaScore sets the minimal Ema score above which the weighted score is used.

func (*WeightedSum) Run

func (ws *WeightedSum) Run(timeSeries *TimeSeries) *ScoreList

Run runs the weighted sum algorithm over the time series

func (*WeightedSum) ScoreWeight

func (ws *WeightedSum) ScoreWeight(weight float64) *WeightedSum

ScoreWeight sets Ema's score weight.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL