Documentation
¶
Overview ¶
Package models provides outlier detection models. ABOD: Angle-Based Outlier Detection. Reference: Kriegel, H.P. and Zimek, A., 2008, August. Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 444-452).
Package models provides outlier detection algorithms. It is inspired by and based on the design of PyOD (Python Outlier Detection).
Package models provides outlier detection models. COF: Connectivity-Based Outlier Factor. Reference: Tang, J., Chen, Z., Fu, A.W.C. and Cheung, D.W., 2002. Enhancing effectiveness of outlier detections for low density patterns. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 535-548). Springer, Berlin, Heidelberg.
Package models provides outlier detection models. COPOD: Copula-Based Outlier Detection. Reference: Li, Z., Zhao, Y., Botta, N., Ionescu, C. and Hu, X., 2020. COPOD: copula-based outlier detection. In 2020 IEEE International Conference on Data Mining (ICDM) (pp. 1118-1123). IEEE.
Package models provides outlier detection algorithms.
Package models provides outlier detection algorithms.
Package models provides outlier detection algorithms.
Package models provides outlier detection algorithms.
Package models provides outlier detection models. LODA: Lightweight on-line detector of anomalies. Reference: Pevny, T., 2016. Loda: Lightweight on-line detector of anomalies. Machine Learning, 102(2), pp.275-304.
Package models provides outlier detection algorithms.
Package models provides outlier detection models. MAD: Median Absolute Deviation for univariate outlier detection. Reference: Iglewicz, B. and Hoaglin, D.C., 1993. How to detect and handle outliers (Vol. 16). Asq Press.
Package models provides outlier detection algorithms.
Package models provides outlier detection models. SOD: Subspace Outlier Detection. Reference: Kriegel, H.P., Kröger, P., Schubert, E. and Zimek, A., 2009. Outlier detection in axis-parallel subspaces of high dimensional data. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 831-838). Springer, Berlin, Heidelberg.
Package models provides outlier detection models. SOS: Stochastic Outlier Selection. Reference: Janssens, J.H.M., Huszar, F., Postma, E.O. and van den Herik, H.J., 2012. Stochastic outlier selection. Tilburg centre for Creative Computing, techreport 2012(1).
Index ¶
- Variables
- func GetMatrixShape(X Matrix) (nSamples, nFeatures int)
- func Percentile(data Vector, p float64) float64
- func ValidateMatrix(X Matrix) error
- type ABOD
- type ABODOptions
- type BaseDetector
- func (b *BaseDetector) GetDecisionScores() Vector
- func (b *BaseDetector) GetLabels() []int
- func (b *BaseDetector) GetThreshold() float64
- func (b *BaseDetector) IsFitted() bool
- func (b *BaseDetector) PredictFromScores(scores Vector) []int
- func (b *BaseDetector) PredictProbaFromScores(scores Vector, method string) (Matrix, error)
- func (b *BaseDetector) ProcessDecisionScores()
- type COF
- type COFOptions
- type COPOD
- type COPODOptions
- type Detector
- type ECOD
- type ECODOptions
- type HBOS
- type HBOSOptions
- type IForest
- type IForestOptions
- type KNN
- type KNNOptions
- type LODA
- type LODAOptions
- type LOF
- type LOFOptions
- type MAD
- type MADOptions
- type Matrix
- type PCA
- type PCAOptions
- type SOD
- type SODOptions
- type SOS
- type SOSOptions
- type Vector
Constants ¶
This section is empty.
Variables ¶
var ErrInvalidContamination = errors.New("contamination must be in (0, 0.5]")
ErrInvalidContamination is returned when contamination is not in valid range
var ErrInvalidData = errors.New("invalid input data")
ErrInvalidData is returned when input data is invalid
var ErrNotFitted = errors.New("detector has not been fitted")
ErrNotFitted is returned when trying to use a detector before fitting
var ErrNotUnivariate = errors.New("MAD is only for univariate data")
ErrNotUnivariate is returned when MAD is used with multivariate data.
Functions ¶
func GetMatrixShape ¶
GetMatrixShape returns the dimensions of a matrix
func Percentile ¶
Percentile calculates the p-th percentile of the data
func ValidateMatrix ¶
ValidateMatrix checks if the input matrix is valid
Types ¶
type ABOD ¶
type ABOD struct {
BaseDetector
// contains filtered or unexported fields
}
ABOD implements Angle-Based Outlier Detection. For an observation, the variance of its weighted cosine scores to all neighbors is used as the outlying score. Lower variance indicates outliers.
func NewABOD ¶
func NewABOD(opts *ABODOptions) *ABOD
NewABOD creates a new ABOD detector with the given options.
func (*ABOD) DecisionFunction ¶
DecisionFunction computes the anomaly score for new samples.
type ABODOptions ¶
type ABODOptions struct {
// Contamination is the proportion of outliers in the data set (default: 0.1)
Contamination float64
// NNeighbors is the number of neighbors for fast ABOD (default: 5)
NNeighbors int
// Method specifies the computation method: "fast" or "default" (default: "fast")
Method string
}
ABODOptions contains options for the ABOD detector.
func DefaultABODOptions ¶
func DefaultABODOptions() *ABODOptions
DefaultABODOptions returns default options for ABOD.
type BaseDetector ¶
type BaseDetector struct {
// Contamination is the proportion of outliers in the data set
Contamination float64
// DecisionScores_ are the outlier scores of the training data
DecisionScores_ Vector
// Threshold_ is the threshold for generating binary labels
Threshold_ float64
// Labels_ are the binary labels of the training data
Labels_ []int
// Fitted indicates whether the detector has been fitted
Fitted bool
// contains filtered or unexported fields
}
BaseDetector provides common functionality for all detectors
func NewBaseDetector ¶
func NewBaseDetector(contamination float64) (*BaseDetector, error)
NewBaseDetector creates a new BaseDetector with the given contamination
func (*BaseDetector) GetDecisionScores ¶
func (b *BaseDetector) GetDecisionScores() Vector
GetDecisionScores returns the decision scores of the training data
func (*BaseDetector) GetLabels ¶
func (b *BaseDetector) GetLabels() []int
GetLabels returns the binary labels of the training data
func (*BaseDetector) GetThreshold ¶
func (b *BaseDetector) GetThreshold() float64
GetThreshold returns the threshold for outlier detection
func (*BaseDetector) IsFitted ¶
func (b *BaseDetector) IsFitted() bool
IsFitted returns whether the detector has been fitted
func (*BaseDetector) PredictFromScores ¶
func (b *BaseDetector) PredictFromScores(scores Vector) []int
PredictFromScores generates binary predictions from anomaly scores
func (*BaseDetector) PredictProbaFromScores ¶
func (b *BaseDetector) PredictProbaFromScores(scores Vector, method string) (Matrix, error)
PredictProbaFromScores calculates probability estimates from scores
func (*BaseDetector) ProcessDecisionScores ¶
func (b *BaseDetector) ProcessDecisionScores()
ProcessDecisionScores calculates the threshold and labels based on decision scores
type COF ¶
type COF struct {
BaseDetector
// contains filtered or unexported fields
}
COF implements the Connectivity-Based Outlier Factor algorithm. COF uses the ratio of average chaining distance of data point and the average of average chaining distance of k nearest neighbors.
func NewCOF ¶
func NewCOF(opts *COFOptions) *COF
NewCOF creates a new COF detector with the given options.
func (*COF) DecisionFunction ¶
DecisionFunction computes the anomaly score for new samples.
type COFOptions ¶
type COFOptions struct {
// Contamination is the proportion of outliers in the data set (default: 0.1)
Contamination float64
// NNeighbors is the number of neighbors to use (default: 20)
NNeighbors int
// Method specifies the computation method: "fast" or "memory" (default: "fast")
Method string
}
COFOptions contains options for the COF detector.
func DefaultCOFOptions ¶
func DefaultCOFOptions() *COFOptions
DefaultCOFOptions returns default options for COF.
type COPOD ¶
type COPOD struct {
BaseDetector
// contains filtered or unexported fields
}
COPOD implements Copula-Based Outlier Detection. COPOD is a parameter-free, highly interpretable outlier detection algorithm based on empirical copula models.
func NewCOPOD ¶
func NewCOPOD(opts *COPODOptions) *COPOD
NewCOPOD creates a new COPOD detector with the given options.
func (*COPOD) DecisionFunction ¶
DecisionFunction computes the anomaly score for new samples.
type COPODOptions ¶
type COPODOptions struct {
// Contamination is the proportion of outliers in the data set (default: 0.1)
Contamination float64
}
COPODOptions contains options for the COPOD detector.
func DefaultCOPODOptions ¶
func DefaultCOPODOptions() *COPODOptions
DefaultCOPODOptions returns default options for COPOD.
type Detector ¶
type Detector interface {
// Fit trains the detector on the input data
// X is a matrix of shape (n_samples, n_features)
// y is optional and ignored in unsupervised methods
Fit(X Matrix, y Vector) error
// Predict returns binary labels for the input data
// 0 for inliers, 1 for outliers
Predict(X Matrix) ([]int, error)
// DecisionFunction returns raw anomaly scores
// Higher scores indicate more abnormal samples
DecisionFunction(X Matrix) (Vector, error)
// PredictProba returns probability estimates
// Returns a matrix of shape (n_samples, 2) with [P(inlier), P(outlier)]
PredictProba(X Matrix, method string) (Matrix, error)
// GetThreshold returns the threshold for outlier detection
GetThreshold() float64
// GetLabels returns the binary labels of the training data
GetLabels() []int
// GetDecisionScores returns the decision scores of the training data
GetDecisionScores() Vector
// IsFitted returns whether the detector has been fitted
IsFitted() bool
}
Detector is the interface that all outlier detection models implement
type ECOD ¶
type ECOD struct {
*BaseDetector
// Training data (stored for prediction)
XTrain_ Matrix
}
ECOD implements Empirical Cumulative Distribution based Outlier Detection. ECOD is a parameter-free, highly interpretable outlier detection algorithm based on empirical CDF functions.
func (*ECOD) DecisionFunction ¶
DecisionFunction returns the anomaly scores for the input samples
type ECODOptions ¶
type ECODOptions struct {
Contamination float64 // Contamination rate (default: 0.1)
}
ECODOptions holds configuration options for ECOD
func DefaultECODOptions ¶
func DefaultECODOptions() *ECODOptions
DefaultECODOptions returns default options for ECOD
type HBOS ¶
type HBOS struct {
*BaseDetector
// NBins is the number of bins for the histogram
// Can be a fixed number or "auto" for automatic selection
NBins int
// Alpha is the regularizer for preventing overflow
Alpha float64
// Tol is the tolerance for samples falling outside bins
Tol float64
// contains filtered or unexported fields
}
HBOS implements Histogram-based Outlier Score algorithm. It assumes feature independence and calculates the degree of outlyingness by building histograms. Reference: Goldstein, M. and Dengel, A., 2012. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm.
func (*HBOS) DecisionFunction ¶
DecisionFunction returns the anomaly scores for the input samples
type HBOSOptions ¶
type HBOSOptions struct {
NBins int // Number of bins (default: 10)
Alpha float64 // Regularizer (default: 0.1)
Tol float64 // Tolerance (default: 0.5)
Contamination float64 // Contamination rate (default: 0.1)
}
HBOSOptions holds configuration options for HBOS
func DefaultHBOSOptions ¶
func DefaultHBOSOptions() *HBOSOptions
DefaultHBOSOptions returns default options for HBOS
type IForest ¶
type IForest struct {
*BaseDetector
// NEstimators is the number of base estimators (trees)
NEstimators int
// MaxSamples is the number of samples to draw for each tree
// If <= 1.0, it's treated as a fraction of the total samples
MaxSamples int
// MaxFeatures is the number of features for each tree
MaxFeatures int
// Bootstrap indicates whether to use bootstrap sampling
Bootstrap bool
// RandomState is the random seed
RandomState int64
// contains filtered or unexported fields
}
IForest implements Isolation Forest algorithm. The IsolationForest 'isolates' observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature.
func NewIForest ¶
func NewIForest(opts *IForestOptions) *IForest
NewIForest creates a new Isolation Forest detector
func (*IForest) DecisionFunction ¶
DecisionFunction returns the anomaly scores for the input samples
type IForestOptions ¶
type IForestOptions struct {
NEstimators int // Number of trees (default: 100)
MaxSamples int // Samples per tree (default: 256)
MaxFeatures int // Features per tree (default: all)
Bootstrap bool // Use bootstrap sampling (default: false)
RandomState int64 // Random seed (default: 0 = random)
Contamination float64 // Contamination rate (default: 0.1)
}
IForestOptions holds configuration options for IForest
func DefaultIForestOptions ¶
func DefaultIForestOptions() *IForestOptions
DefaultIForestOptions returns default options for IForest
type KNN ¶
type KNN struct {
*BaseDetector
// NNeighbors is the number of neighbors to use
NNeighbors int
// Method defines how to calculate the outlier score:
// "largest": use the distance to the kth neighbor
// "mean": use the average of all k neighbors distances
// "median": use the median of the distances to k neighbors
Method string
// Metric defines the distance metric: "euclidean", "manhattan", "minkowski"
Metric string
// P is the parameter for Minkowski distance
P float64
// Training data
X_ Matrix
}
KNN implements k-Nearest Neighbors based outlier detection. For an observation, its distance to its kth nearest neighbor could be viewed as the outlying score.
func (*KNN) DecisionFunction ¶
DecisionFunction returns the anomaly scores for the input samples
type KNNOptions ¶
type KNNOptions struct {
NNeighbors int // Number of neighbors (default: 5)
Method string // "largest", "mean", or "median" (default: "largest")
Metric string // Distance metric (default: "euclidean")
P float64 // Minkowski p parameter (default: 2)
Contamination float64 // Contamination rate (default: 0.1)
}
KNNOptions holds configuration options for KNN
func DefaultKNNOptions ¶
func DefaultKNNOptions() *KNNOptions
DefaultKNNOptions returns default options for KNN
type LODA ¶
type LODA struct {
BaseDetector
// contains filtered or unexported fields
}
LODA implements the Lightweight On-line Detector of Anomalies. LODA is an ensemble method that combines sparse random projections with one-dimensional histograms.
func NewLODA ¶
func NewLODA(opts *LODAOptions) *LODA
NewLODA creates a new LODA detector with the given options.
func (*LODA) DecisionFunction ¶
DecisionFunction computes the anomaly score for new samples.
type LODAOptions ¶
type LODAOptions struct {
// Contamination is the proportion of outliers in the data set (default: 0.1)
Contamination float64
// NBins is the number of histogram bins (default: 10). Use "auto" for automatic selection.
NBins int
// AutoBins determines whether to use automatic bin selection (default: false)
AutoBins bool
// NRandomCuts is the number of random cuts/projections (default: 100)
NRandomCuts int
// RandomState for reproducibility (default: nil for random)
RandomState *rand.Rand
}
LODAOptions contains options for the LODA detector.
func DefaultLODAOptions ¶
func DefaultLODAOptions() *LODAOptions
DefaultLODAOptions returns default options for LODA.
type LOF ¶
type LOF struct {
*BaseDetector
// NNeighbors is the number of neighbors to use
NNeighbors int
// Metric defines the distance metric: "euclidean", "manhattan", "minkowski"
Metric string
// P is the parameter for Minkowski distance
P float64
// Training data
X_ Matrix
// contains filtered or unexported fields
}
LOF implements Local Outlier Factor algorithm. It measures the local deviation of density of a given sample with respect to its neighbors.
func (*LOF) DecisionFunction ¶
DecisionFunction returns the anomaly scores for the input samples
type LOFOptions ¶
type LOFOptions struct {
NNeighbors int // Number of neighbors (default: 20)
Metric string // Distance metric (default: "euclidean")
P float64 // Minkowski p parameter (default: 2)
Contamination float64 // Contamination rate (default: 0.1)
}
LOFOptions holds configuration options for LOF
func DefaultLOFOptions ¶
func DefaultLOFOptions() *LOFOptions
DefaultLOFOptions returns default options for LOF
type MAD ¶
type MAD struct {
BaseDetector
// contains filtered or unexported fields
}
MAD implements Median Absolute Deviation for univariate outlier detection. MAD measures the distances of data points from the median in terms of median distance using modified z-scores.
func NewMAD ¶
func NewMAD(opts *MADOptions) *MAD
NewMAD creates a new MAD detector with the given options.
func (*MAD) DecisionFunction ¶
DecisionFunction computes the anomaly score for new samples.
type MADOptions ¶
type MADOptions struct {
// Threshold is the modified z-score threshold (default: 3.5)
Threshold float64
// Contamination is the proportion of outliers in the data set (default: 0.1)
Contamination float64
}
MADOptions contains options for the MAD detector.
func DefaultMADOptions ¶
func DefaultMADOptions() *MADOptions
DefaultMADOptions returns default options for MAD.
type Matrix ¶
type Matrix [][]float64
Matrix represents a 2D slice of float64 values (n_samples x n_features)
type PCA ¶
type PCA struct {
*BaseDetector
// NComponents is the number of principal components to keep
NComponents int
// NSelectedComponents is the number of components used for scoring
// If 0, uses all components
NSelectedComponents int
// Weighted indicates whether to weight components by explained variance
Weighted bool
// Standardization indicates whether to standardize data
Standardization bool
// contains filtered or unexported fields
}
PCA implements Principal Component Analysis based outlier detection. Outlier scores are computed as the sum of weighted distances from samples to the principal component hyperplanes.
func (*PCA) DecisionFunction ¶
DecisionFunction returns the anomaly scores for the input samples
type PCAOptions ¶
type PCAOptions struct {
NComponents int // Number of components (default: 0 = all)
NSelectedComponents int // Components for scoring (default: 0 = all)
Weighted bool // Weight by variance (default: true)
Standardization bool // Standardize data (default: true)
Contamination float64 // Contamination rate (default: 0.1)
}
PCAOptions holds configuration options for PCA
func DefaultPCAOptions ¶
func DefaultPCAOptions() *PCAOptions
DefaultPCAOptions returns default options for PCA
type SOD ¶
type SOD struct {
BaseDetector
// contains filtered or unexported fields
}
SOD implements Subspace Outlier Detection. SOD explores the axis-parallel subspace spanned by the data object's neighbors and determines how much the object deviates from the neighbors.
func NewSOD ¶
func NewSOD(opts *SODOptions) *SOD
NewSOD creates a new SOD detector with the given options.
func (*SOD) DecisionFunction ¶
DecisionFunction computes the anomaly score for new samples.
type SODOptions ¶
type SODOptions struct {
// Contamination is the proportion of outliers in the data set (default: 0.1)
Contamination float64
// NNeighbors is the number of neighbors for kNN (default: 20)
NNeighbors int
// RefSet is the number of shared nearest neighbors for reference set (default: 10)
RefSet int
// Alpha is the lower limit for selecting subspace (default: 0.8)
Alpha float64
}
SODOptions contains options for the SOD detector.
func DefaultSODOptions ¶
func DefaultSODOptions() *SODOptions
DefaultSODOptions returns default options for SOD.
type SOS ¶
type SOS struct {
BaseDetector
// contains filtered or unexported fields
}
SOS implements Stochastic Outlier Selection. SOS uses the concept of affinity to quantify the relationship between data points. A point is an outlier when all other points have insufficient affinity with it.
func NewSOS ¶
func NewSOS(opts *SOSOptions) *SOS
NewSOS creates a new SOS detector with the given options.
func (*SOS) DecisionFunction ¶
DecisionFunction computes the anomaly score for new samples.
type SOSOptions ¶
type SOSOptions struct {
// Contamination is the proportion of outliers in the data set (default: 0.1)
Contamination float64
// Perplexity is a smooth measure of effective number of neighbors (default: 4.5)
Perplexity float64
// Eps is the tolerance threshold for binary search (default: 1e-5)
Eps float64
}
SOSOptions contains options for the SOS detector.
func DefaultSOSOptions ¶
func DefaultSOSOptions() *SOSOptions
DefaultSOSOptions returns default options for SOS.
type Vector ¶
type Vector []float64
Vector represents a 1D slice of float64 values
func InvertOrder ¶
InvertOrder inverts the order of scores (smallest becomes largest) This is useful for combining detectors with different score orderings