Documentation
¶
Overview ¶
Package kodama provides cgo bindings to hierarchical clustering.
The ideas and implementation in this crate are heavily based on the work of Daniel Müllner, and in particular, his 2011 paper, Modern hierarchical, agglomerative clustering algorithms. Parts of the implementation have also been inspired by his C++ library, fastcluster. Müllner's work, in turn, is based on the hierarchical clustering facilities provided by MATLAB and SciPy.
The runtime performance of this library is on par with Müllner's fastcluster implementation.
For more detailed information, see the documentation for the Rust library at https://docs.rs/kodama. Most or all of the things should translate straight-forwardly to these Go bindings.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Dendrogram ¶
type Dendrogram struct {
// contains filtered or unexported fields
}
Dendrogram is a stepwise representation of a hierarchical clustering of N observations.
A dendrogram consists of a series of N - 1 steps, where N is the number of observations that were clustered. Each step corresponds to the creation of a new cluster by merging exactly two previous clusters.
func Linkage32 ¶
func Linkage32( condensedDissimilarityMatrix []float32, observations int, method Method, ) *Dendrogram
Linkage32 returns a hierarchical clustering of observations given their pairwise dissimilarities as single-precision floating point numbers.
The pairwise dissimilarities must be provided as a *condensed pairwise dissimilarity matrix*, where only the values in the upper triangle are explicitly represented, not including the diagonal. As a result, the given matrix should have length observations-choose-2 (which is (observations * (observations - 1)) / 2) and only have values defined for pairs of (a, b) where a < b.
The observations parameter is the total number of observations that are being clustered. Every pair of observations must have a finite non-NaN dissimilarity.
The return value is a dendrogram. The dendrogram encodes a hierarchical clustering as a sequence of observations - 1 steps, where each step corresponds to the creation of a cluster by merging exactly two previous clusters. The very last cluster created contains all observations.
If the length of the given matrix is not consistent with the number of observations, then this function will panic.
The given matrix is never copied, but its values may be mutated during clustering.
func Linkage64 ¶
func Linkage64( condensedDissimilarityMatrix []float64, observations int, method Method, ) *Dendrogram
Linkage64 returns a hierarchical clustering of observations given their pairwise dissimilarities as double-precision floating point numbers.
The pairwise dissimilarities must be provided as a *condensed pairwise dissimilarity matrix*, where only the values in the upper triangle are explicitly represented, not including the diagonal. As a result, the given matrix should have length observations-choose-2 (which is (observations * (observations - 1)) / 2) and only have values defined for pairs of (a, b) where a < b.
The observations parameter is the total number of observations that are being clustered. Every pair of observations must have a finite non-NaN dissimilarity.
The return value is a dendrogram. The dendrogram encodes a hierarchical clustering as a sequence of observations - 1 steps, where each step corresponds to the creation of a cluster by merging exactly two previous clusters. The very last cluster created contains all observations.
If the length of the given matrix is not consistent with the number of observations, then this function will panic.
The given matrix is never copied, but its values may be mutated during clustering.
func (*Dendrogram) Len ¶
func (dend *Dendrogram) Len() int
Len returns the number of steps in this dendrogram.
func (*Dendrogram) Observations ¶
func (dend *Dendrogram) Observations() int
Observations returns the number of observations in the data that is clustered by this dendrogram.
func (*Dendrogram) Steps ¶
func (dend *Dendrogram) Steps() []Step
Steps returns a slice of steps that make up the given dendrogram.
type Method ¶
type Method int
Method indicates the update formula for computing dissimilarities between clusters.
The method dictates how the dissimilarities are computed whenever a new cluster is formed. In particular, when clusters a and b are merged into a new cluster ab, then the pairwise dissimilarity between ab and every other cluster is computed using one of the variants of this type.
type Step ¶
type Step struct { // The label corresponding to the first cluster. Cluster1 int // The label corresponding to the second cluster. Cluster2 int // The dissimilarity between cluster1 and cluster2. Dissimilarity float64 // The total number of observations in this merged cluster. Size int }
Step is a single merge step in a dendrogram.
Each step corresponds to the creation of a new cluster by merging two previous clusters.
By convention, the smaller cluster label is always assigned to the `cluster1` field.