rpca

package module
v0.0.0-...-6f7e3e4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 17, 2016 License: Apache-2.0 Imports: 6 Imported by: 1

README

rpca

GoDoc

RPCA is a Go library for running Robust Principal Component Analysis for anomaly detection.

Getting started

API documentation is available via godoc.

License

Copyright 2016 President and Fellows of Harvard College

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Documentation

Overview

Package rpca implements anomaly detection using Robust Principle Component Analysis (http://techblog.netflix.com/2015/02/rad-outlier-detection-on-big-data.html). It is a port of RPCA provided by Netflix as part of their Surus project (https://github.com/Netflix/Surus). It takes bits and pieces of Netflix's RAD implementations written in R, C++, Java, and Javascript.

Index

Constants

View Source
const MAX_ITERS int = 1000

The maximum number of iterations before we give up trying to converge.

Variables

This section is empty.

Functions

func AutoDiff

func AutoDiff(active bool) func(*rpcaConfig) error

Whether or not to detect if the given time series contains a significant global trend that should be removed before anomaly detection. Trend detection is done with the Augmented Dickey-Fuller test. Note that auto-differencing will change the nature of the detected anomalies. If the time series is not detrended, a lasting mean-shift in the time series (for example, a large, sustained increase) will result in a number of consecutive points after the shift being identified as anomalous. If the time series is detrended, only the single point that marks the beginning of the shift will be identified as anomalous.

func ForceDiff

func ForceDiff(active bool) func(*rpcaConfig) error

If true, skip the Augmented Dickey-Fuller test and always auto-difference the given time series.

func Frequency

func Frequency(freq int) func(*rpcaConfig) error

Frequency informs the algorithm of the major frequency of the time series to use for analysis. For example, if you have 56 points of daily measurements, the major frequency is likely 7, which would capture the weekly trend. Note that due to the nature of the algorithm, the length of the provided time series must be divisible by the frequency.

func LPenalty

func LPenalty(penalty float64) func(*rpcaConfig) error

A scalar for the amount of thresholding to use when determining the low rank approximation of the given time series. The default values are chosen to correspond to the smart thresholding values described in Zhou's Stable Principal Component Pursuit.

func SPenalty

func SPenalty(penalty float64) func(*rpcaConfig) error

A scalar for the amount of thresholding to use when determining the separation between noise and sparse outliers. The default values are chosen to correspond to the smart thresholding values described in Zhou's Stable Principal Component Pursuit.

func Scale

func Scale(active bool) func(*rpcaConfig) error

If false, do not normalize the time series before running anomaly detection. This could result in the algorithm not converging on a nice solution.

func Verbose

func Verbose(active bool) func(*rpcaConfig) error

If true, print lots of information about each iteration of the algorithm.

Types

type Anomalies

type Anomalies struct {
	// A slice of booleans indicating which values in the provided time series
	// were anomalous.
	Positions []bool

	// Values is a slice of floats indicating exactly how anomlous each point in
	// the provided time series was. Points that were not anomalous have a value
	// of zero. Points that were anomalously low have negative values, while
	// points that were anomalously high have positive values.
	Values []float64

	// Part of the RPCA process requires normalizing the given time series by
	// subtracting the mean and dividing by the standard deviation (Z scoring)
	// before detecting anomalies. The anomalousness of each point is computed in
	// this Z-scored space before being transformed back into the domain of the
	// given time series. Sometimes, it's useful to have the normalized values,
	// for example, when comparing anomalies across time series.
	NormedValues []float64
}

func FindAnomalies

func FindAnomalies(series []float64, options ...func(*rpcaConfig) error) Anomalies

FindAnomalies is the primary function to use when using this package. It takes a slice of floats and any number of options. Passing options may look a little funny. This is because this package uses functional arguments to make the API easier to use (more on functional arguments here: http://dave.cheney.net/2014/10/17/functional-options-for-friendly-apis). Basically, all options have default values, and to change that value, pass options like so:

anoms := rpca.FindAnomalies(series, rpca.Frequency(7), rpca.AutoDiff(true))

The interface is designed to match that of Netflix's anomaly detection R package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL