cca

package module
v0.0.0-...-4747f09 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 8, 2022 License: BSD-3-Clause Imports: 3 Imported by: 0

README

Working with matrices

Neither Go nor R have matrix types as first class citizens, and R's approach to specifying matrixness is not currently understood by rgo (this may change). So working with matrices requires some extra effort.

To show the work involved this example will reprise the Gonum stat example at https://pkg.go.dev/gonum.org/v1/gonum/stat?tab=doc#example-CC which performs a canonical correlations analysis on the MASS::Boston data.

This needs a Go wrapper around the CC.CanonicalCorrelations method from the Gonum stat package.

func CCA(x, y blas64.GeneralCols) (ccors []float64, pVecs, qVecs, phiVs, psiVs blas64.GeneralCols, err error) {
	var xdata, ydata mat.Dense
	xdata.SetRawMatrix(rowMajor(x))
	ydata.SetRawMatrix(rowMajor(y))

	var cc stat.CC
	err = cc.CanonicalCorrelations(&xdata, &ydata, nil)
	if err != nil {
		return nil, pVecs, qVecs, phiVs, psiVs, err
	}
	ccors = cc.CorrsTo(nil)

	var _pVecs, _qVecs, _phiVs, _psiVs mat.Dense
	cc.LeftTo(&_pVecs, true)
	cc.RightTo(&_qVecs, true)
	cc.LeftTo(&_phiVs, false)
	cc.RightTo(&_psiVs, false)

	return ccors,
		colMajor(_pVecs.RawMatrix()),
		colMajor(_qVecs.RawMatrix()),
		colMajor(_phiVs.RawMatrix()),
		colMajor(_psiVs.RawMatrix()),
		err
}

Note that we need some helpers to convert the column major R matrix type to the Gonum matrix values used by CanonicalCorrelations

func rowMajor(a blas64.GeneralCols) blas64.General {
	t := blas64.General{
		Rows:   a.Rows,
		Cols:   a.Cols,
		Data:   make([]float64, len(a.Data)),
		Stride: a.Cols,
	}
	t.From(a)
	return t
}

and back again.

func colMajor(a blas64.General) blas64.GeneralCols {
	t := blas64.GeneralCols{
		Rows:   a.Rows,
		Cols:   a.Cols,
		Data:   make([]float64, len(a.Data)),
		Stride: a.Rows,
	}
	t.From(a)
	return t
}

With this done, we can perform the usual steps to build an rgo package, starting with defining the package's go.mod file with

$ go mod init github.com/rgonomic/rgo/examples/cca

running rgo init with an argument pointing to the current directory since we are at the root of github.com/rgonomic/rgo/examples/cca,

$ rgo init .

Since there is only one function and it has an all upper-case name, we don't need to make any change to the rgo.json file.

The wrapper code is then generated by running the build subcommand.

$ rgo build

This will generate the Go, C and R wrapper code for the R package, and collate all the licenses in the source package into the LicenseDir directory. At this stage the DESCRIPTION file should be edited and non-relevant licenses should be removed.

The package can now be installed.

$ R CMD INSTALL .

To replicate the Gonum example, we need the Boston data in the same format described in the example.

> library(MASS)
> x <- cbind(Boston$crim, Boston$indus, Boston$nox, Boston$dis, Boston$rad, Boston$ptratio, Boston$black)
> y <- cbind(Boston$rm, Boston$age, Boston$tax, Boston$medv)
> boston <- cbind(x, y)

We also need a couple of helpers to convert the R matrix representation based on attributes to a list with the values needed to populate the struct values (blas64.GeneralCols) that the Go CCA function accepts, and the convert the results back again.

> mat_list <- function(a) {
+ 	return(list(Rows = nrow(a), Cols = ncol(a), Data = as.vector(a), Stride = nrow(a)))
+ }
> list_mat <- function(a) {
+ 	return(matrix(data = a$Data, nrow = a$Rows, ncol = a$Cols))
+ }

Note that we could leave the result in row-major and assume that the user will get R to do that work by adding a byrow = TRUE, but we don't want to be that person.

With all that done, invoking cca with our Boston data gives us the results that we expect.

> library(cca)
> r <- cca::cca(mat_list(x), mat_list(y), NULL)
> r$err
NULL
> r$ccors
[1] 0.9451239 0.6786623 0.5714338 0.2009740
> list_mat(r$pVecs)
            [,1]        [,2]       [,3]        [,4]
[1,] -0.25743919  0.01584775  0.2122170 -0.09457338
[2,] -0.48365944  0.38371019  0.1474448  0.65973249
[3,] -0.08007764  0.34935567  0.3287336 -0.28620404
[4,]  0.12775864 -0.73374277  0.4851135  0.22479649
[5,] -0.69694320 -0.43417488 -0.3602873  0.02906616
[6,] -0.09909033  0.05034112  0.6384331  0.10223671
[7,]  0.42604600  0.03233344 -0.2289528  0.64192329
> list_mat(r$qVecs)
            [,1]       [,2]         [,3]        [,4]
[1,]  0.01816605 -0.1583489 -0.006672358 -0.98719354
[2,] -0.23476990  0.9483315 -0.146242051 -0.15544708
[3,] -0.97007040 -0.2406072 -0.025183898  0.02091341
[4,]  0.05930007 -0.1330460 -0.988905715  0.02911615
> list_mat(r$phiVs)
              [,1]          [,2]         [,3]         [,4]
[1,] -0.0027462234  0.0093444514  0.048964393 -0.015496719
[2,] -0.0428564455 -0.0241708702  0.036072347  0.183898323
[3,] -1.2248435649  5.6030921365  5.809414458 -4.792681219
[4,] -0.0043684825 -0.3424101165  0.446996122  0.115016181
[5,] -0.0741534070 -0.1193135795 -0.111551831  0.002163876
[6,] -0.0233270323  0.1046330818  0.385304598 -0.016092787
[7,]  0.0001293051  0.0004540747 -0.003029632  0.008189548
> list_mat(r$psiVs)
             [,1]        [,2]         [,3]          [,4]
[1,]  0.030159336 -0.30022193  0.087821738 -1.9583226532
[2,] -0.006548310  0.03922121 -0.011757078 -0.0061113064
[3,] -0.005207552 -0.00457702 -0.002276231  0.0008441873
[4,]  0.002011174  0.00373528 -0.129257807  0.1037709056

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CCA

func CCA(x, y blas64.GeneralCols, weights []float64) (ccors []float64, pVecs, qVecs, phiVs, psiVs blas64.GeneralCols, err error)

CCA performs a canonical correlation analysis of the input data x and y, columns of which should be interpretable as two sets of measurements on the same observations (rows). These observations are optionally weighted by weights.

CCA will return an error if the inputs x and y do not have the same number of rows.

The vector weights is used to weight the observations. If weights is NULL, each weight is considered to have a value of one, otherwise the length of weights must match the number of observations (rows of both x and y) or CanonicalCorrelations will return an error..

Types

This section is empty.

Directories

Path Synopsis
src
rgo

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL