kodama

package
v0.0.0-...-2b5da0b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 9, 2025 License: MIT, MIT Imports: 6 Imported by: 0

README

go-kodama

This package provides cgo bindings to the kodama hierarchical clustering library.

This package is released under the MIT license.

Documentation

godoc.org/github.com/diffeo/kodama/go-kodama

The primary documentation for the Rust library, including a definition of the syntax, can be found here: https://docs.rs/kodama

Install + Example

You'll need to install Rust (you'll need at least Rust 1.19, which is the current stable release) and have a Go compiler handy. To run tests for go-kodama, we'll need to compile the Rust kodama library and then tell the Go compiler where to find it. These commands should do it:

$ mkdir -p $GOPATH/src/github.com/diffeo
$ cd $GOPATH/src/github.com/diffeo
$ git clone git://github.com/diffeo/kodama
$ cd kodama/go-kodama
$ cargo build --release --manifest-path ../kodama-capi/Cargo.toml
$ export CGO_LDFLAGS="-L../kodama-capi/target/release"
$ export LD_LIBRARY_PATH="../kodama-capi/target/release"

Now you can run tests:

$ go test -v
=== RUN   TestLinkage64
--- PASS: TestLinkage64 (0.00s)
=== RUN   TestLinkage32
--- PASS: TestLinkage32 (0.00s)
PASS
ok      github.com/diffeo/kodama/go-kodama      0.003s

Or try compiling an example program:

$ go install github.com/diffeo/kodama/go-kodama/go-kodama-example
$ $GOPATH/bin/go-kodama-example
kodama.Step{Cluster1:2, Cluster2:4, Dissimilarity:3.1237967760688776, Size:2}
kodama.Step{Cluster1:5, Cluster2:6, Dissimilarity:5.757158112027513, Size:3}
kodama.Step{Cluster1:1, Cluster2:7, Dissimilarity:8.1392602685723, Size:4}
kodama.Step{Cluster1:3, Cluster2:8, Dissimilarity:12.483148228609206, Size:5}
kodama.Step{Cluster1:0, Cluster2:9, Dissimilarity:25.589444117482433, Size:6}

Note that, at least on Linux, the above setup will dynamically link the Rust kodama library into the Go executable:

$ ldd $GOPATH/bin/go-kodama-example
...
        libkodama.so => ../kodama-capi/target/release/libkodama.so (0x00007f464cde6000)
...

It is possible to statically link kodama completely as well. For this, we need to re-compile the kodama library into a static archive with musl, which will statically link libc. This is easy to do if you have rustup, which permits adding new targets. The new target can then be used with Cargo with the --target flag. So to accomplish the above, we'll run a similar set of steps for the setup:

$ mkdir -p $GOPATH/src/github.com/diffeo
$ cd $GOPATH/src/github.com/diffeo
$ git clone git://github.com/diffeo/kodama
$ cd kodama/go-kodama

And now we add musl and compile (note that different value of CGO_LDFLAGS!):

$ rustup target add x86_64-unknown-linux-musl
$ cargo build --release --manifest-path ../kodama-capi/Cargo.toml --target x86_64-unknown-linux-musl
$ export CGO_LDFLAGS="-L../kodama-capi/target/x86_64-unknown-linux-musl/release"

Now go test will work, and re-compiling go-kodama-example will also work, but will no longer dynamically link with the kodama library:

$ go install github.com/diffeo/kodama/go-kodama/go-kodama-example
$ $GOPATH/bin/go-kodama-example
kodama.Step{Cluster1:2, Cluster2:4, Dissimilarity:3.1237967760688776, Size:2}
kodama.Step{Cluster1:5, Cluster2:6, Dissimilarity:5.757158112027513, Size:3}
kodama.Step{Cluster1:1, Cluster2:7, Dissimilarity:8.1392602685723, Size:4}
kodama.Step{Cluster1:3, Cluster2:8, Dissimilarity:12.483148228609206, Size:5}
kodama.Step{Cluster1:0, Cluster2:9, Dissimilarity:25.589444117482433, Size:6}
$ ldd $GOPATH/bin/go-kodama-example
        linux-vdso.so.1 (0x00007fff0cce3000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f2237ee6000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007f2237b40000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f2238104000)

In fact, you can even go a step further and create a fully static executable:

$ go install -ldflags "-linkmode external -extldflags -static" github.com/diffeo/kodama/go-kodama/go-kodama-example
$ ldd $GOPATH/bin/go-kodama-example
        not a dynamic executable

Documentation

Overview

Package kodama provides cgo bindings to hierarchical clustering.

The ideas and implementation in this crate are heavily based on the work of Daniel Müllner, and in particular, his 2011 paper, Modern hierarchical, agglomerative clustering algorithms. Parts of the implementation have also been inspired by his C++ library, fastcluster. Müllner's work, in turn, is based on the hierarchical clustering facilities provided by MATLAB and SciPy.

The runtime performance of this library is on par with Müllner's fastcluster implementation.

For more detailed information, see the documentation for the Rust library at https://docs.rs/kodama. Most or all of the things should translate straight-forwardly to these Go bindings.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Dendrogram

type Dendrogram struct {
	// contains filtered or unexported fields
}

Dendrogram is a stepwise representation of a hierarchical clustering of N observations.

A dendrogram consists of a series of N - 1 steps, where N is the number of observations that were clustered. Each step corresponds to the creation of a new cluster by merging exactly two previous clusters.

func Linkage32

func Linkage32(
	condensedDissimilarityMatrix []float32,
	observations int,
	method Method,
) *Dendrogram

Linkage32 returns a hierarchical clustering of observations given their pairwise dissimilarities as single-precision floating point numbers.

The pairwise dissimilarities must be provided as a *condensed pairwise dissimilarity matrix*, where only the values in the upper triangle are explicitly represented, not including the diagonal. As a result, the given matrix should have length observations-choose-2 (which is (observations * (observations - 1)) / 2) and only have values defined for pairs of (a, b) where a < b.

The observations parameter is the total number of observations that are being clustered. Every pair of observations must have a finite non-NaN dissimilarity.

The return value is a dendrogram. The dendrogram encodes a hierarchical clustering as a sequence of observations - 1 steps, where each step corresponds to the creation of a cluster by merging exactly two previous clusters. The very last cluster created contains all observations.

If the length of the given matrix is not consistent with the number of observations, then this function will panic.

The given matrix is never copied, but its values may be mutated during clustering.

func Linkage64

func Linkage64(
	condensedDissimilarityMatrix []float64,
	observations int,
	method Method,
) *Dendrogram

Linkage64 returns a hierarchical clustering of observations given their pairwise dissimilarities as double-precision floating point numbers.

The pairwise dissimilarities must be provided as a *condensed pairwise dissimilarity matrix*, where only the values in the upper triangle are explicitly represented, not including the diagonal. As a result, the given matrix should have length observations-choose-2 (which is (observations * (observations - 1)) / 2) and only have values defined for pairs of (a, b) where a < b.

The observations parameter is the total number of observations that are being clustered. Every pair of observations must have a finite non-NaN dissimilarity.

The return value is a dendrogram. The dendrogram encodes a hierarchical clustering as a sequence of observations - 1 steps, where each step corresponds to the creation of a cluster by merging exactly two previous clusters. The very last cluster created contains all observations.

If the length of the given matrix is not consistent with the number of observations, then this function will panic.

The given matrix is never copied, but its values may be mutated during clustering.

func (*Dendrogram) Len

func (dend *Dendrogram) Len() int

Len returns the number of steps in this dendrogram.

func (*Dendrogram) Observations

func (dend *Dendrogram) Observations() int

Observations returns the number of observations in the data that is clustered by this dendrogram.

func (*Dendrogram) Steps

func (dend *Dendrogram) Steps() []Step

Steps returns a slice of steps that make up the given dendrogram.

type Method

type Method int

Method indicates the update formula for computing dissimilarities between clusters.

The method dictates how the dissimilarities are computed whenever a new cluster is formed. In particular, when clusters a and b are merged into a new cluster ab, then the pairwise dissimilarity between ab and every other cluster is computed using one of the variants of this type.

const (
	MethodSingle Method = iota
	MethodComplete
	MethodAverage
	MethodWeighted
	MethodWard
	MethodCentroid
	MethodMedian
)

The available methods for computing linkage.

type Step

type Step struct {
	// The label corresponding to the first cluster.
	Cluster1 int
	// The label corresponding to the second cluster.
	Cluster2 int
	// The dissimilarity between cluster1 and cluster2.
	Dissimilarity float64
	// The total number of observations in this merged cluster.
	Size int
}

Step is a single merge step in a dendrogram.

Each step corresponds to the creation of a new cluster by merging two previous clusters.

By convention, the smaller cluster label is always assigned to the `cluster1` field.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL