silhouette

package module
v0.0.0-...-9bb9963 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 18, 2019 License: MIT Imports: 6 Imported by: 0

README

silhouette

Silhouette cluster analysis implementation in Go

What It Does

Silhouette refers to an algorithm used to interpret and validate the consistency within clusters of data.

The silhouette value is a measure of how similar an object is to its own cluster compared to other clusters. The silhouette ranges from −1 to +1, where a high value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters.

If most objects have a high value, then the clustering configuration is appropriate. If many points have a low or negative value, then the clustering configuration may have too many or too few clusters.

When You Should Use It

  • When you have numeric, multi-dimensional data sets
  • If you want to check whether your data set is clustered
  • When you have a vague idea of the clustering in your data set
  • You want to figure out the optimal clustering configuration

Example

import (
    "github.com/muesli/silhouette"
    "github.com/muesli/clusters"
    "github.com/muesli/kmeans"
)

// initialize your data set
// for the example we'll use three distinct clusters of data points
var d clusters.Observations
for x := 0; x < 64; x++ {
	d = append(d, clusters.Coordinates{
		rand.Float64() * 0.1,
		rand.Float64() * 0.1,
	})
}
for x := 0; x < 64; x++ {
	d = append(d, clusters.Coordinates{
		0.5 + rand.Float64()*0.1,
		0.5 + rand.Float64()*0.1,
	})
}
for x := 0; x < 64; x++ {
	d = append(d, clusters.Coordinates{
		0.9 + rand.Float64()*0.1,
		0.9 + rand.Float64()*0.1,
	})
}

// silhouette will theoretically work with multiple clustering algorithms
// it's commonly used with k-means
km := kmeans.New()

// compute the average silhouette score (coefficient) for 2 to 8 clusters, using
// the k-means clustering algorithm
scores, err := silhouette.Scores(d, 8, km)
for _, s := range scores {
    fmt.Printf("k: %d (score: %.2f)\n", s.K, s.Score)
}

// estimate the amount of clusters in our data set
// this returns the k with the highest score (where 2 <= k <= 8)
k, score, err := silhouette.EstimateK(d, 8, km)

// k is usually 3 for this example, with a score close to 1.0
// note that k-means doesn't always converge optimally
...
}

Development

GoDoc Build Status Coverage Status Go ReportCard

Documentation

Overview

Package silhouette implements the silhouette cluster analysis algorithm See: https://en.wikipedia.org/wiki/Silhouette_(clustering)

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func EstimateK

func EstimateK(data clusters.Observations, kmax int, m Partitioner) (int, float64, error)

EstimateK estimates the amount of clusters (k) along with the silhouette score for that value, using the given partitioning algorithm

func Plot

func Plot(filename string, scores []KScore) error

Plot creates a graph of the silhouette scores

func Score

func Score(data clusters.Observations, k int, m Partitioner) (float64, error)

Score calculates the silhouette score for a given value of k, using the given partitioning algorithm

Types

type KScore

type KScore struct {
	K     int
	Score float64
}

KScore holds the score for a value of K

func Scores

func Scores(data clusters.Observations, kmax int, m Partitioner) ([]KScore, error)

Scores calculates the silhouette scores for all values of k between 2 and kmax, using the given partitioning algorithm

type Partitioner

type Partitioner interface {
	Partition(data clusters.Observations, k int) (clusters.Clusters, error)
}

Partitioner interface which suitable clustering algorithms should implement

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL