indep

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 14, 2018 License: BSD-3-Clause Imports: 4 Imported by: 0

Documentation

Overview

Package indep contains independence test algorithms (e.g. G-Test and Pearson's).

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ChiSquare

func ChiSquare(chi float64, df int) float64

ChiSquare returns the cumulative distribution function at point chi, that is:

Pr(X^2 <= chi)

Where X^2 is the chi-square distribution X^2(df), with df being the degree of freedom.

func ChiSquareTest

func ChiSquareTest(p, q int, data [][]int, sigval float64) bool

ChiSquareTest returns whether variable x and y are statistically independent. We use the Chi-Square test to find correlations between the two variables. Argument data is a table with the counting of each variable category, where the first axis is the counting of each category of variable x and the second axis of variable y. The last element of each row and column is the total counting. E.g.:

+------------------------+
|      X_1 X_2 X_3 total |
| Y_1  100 200 100  400  |
| Y_2   50 300  25  375  |
|total 150 500 125  775  |
+------------------------+

Argument p is the number of categories (or levels) in x.

Argument q is the number of categories (or levels) in y.

Returns true if independent and false otherwise.

func Chisqr

func Chisqr(df int, cv float64) float64

Chisqr gives the function Chi-Square.

func Chisquare

func Chisquare(df int, cv float64) float64

Chisquare returns the p-value of Pr(X^2 > cv). Compare this value to the significance level assumed. If chisquare < sigval, then we cannot accept the null hypothesis and thus the two variables are dependent.

Thanks to Jacob F. W. for a tutorial on chi-square distributions. Source: http://www.codeproject.com/Articles/432194/How-to-Calculate-the-Chi-Squared-P-Value

func GTest

func GTest(p, q int, data [][]int, n int, sigval float64) bool

GTest is the G-Test log-likelihood independence test.

Types

type Graph

type Graph struct {

	// This k-set contains the connected subgraphs that are completely separated from each other.
	Kset [][]int
	// contains filtered or unexported fields
}

Graph represents an independence graph.

An independence graph is an undirected graph that maps the (in)dependencies of a set of variable. Let X={X_1,...,X_n} be the set of variables. We define an independence graph as an undirected graph G=(X, E) where there exists an edge between a pair of vertices u,v in X iff there exists a dependency between variables u and v. That is, if two variables are dependent than there exists an edge between them. Otherwise there is no such edge.

The resulting graph after such construction is a graph with clusters of connected graphs. Let H_1 and H_2 be two complete subgraphs in G. Then there exists no edge between any one vertex in H_1 and another in H_2. This constitutes an independence relation between these subgraphs. Thus we say that sets of variables in H_1 are independent of sets of variables in H_2. We now show why this is correct. Consider the following example (it can be extended to the general case easily):

Let X, Y and Z be variables. We will denote the symbol ~ as a dependency relation. That is, X ~ Y means that X is dependent of Y. Consider the case where X ~ Y. Then there exists an edge between X and Y. If Z is independent of both, then Y is disconnected from X-Y. The converse holds, since if there exists no edge between them they are independent. Now consider X ~ Y and Y ~ Z. Since X-Y, Y-Z and therefore the graph is connected. The last case is when everyone is independent of everyone, in which case there are no edges and all variables are disconnected. We can assume X, Y and Z as sets of variables for the general case.

To construct the graph, we can check for dependencies on each distinct pair of variables (u,v) of set X. If there exists a dependency, add an edge u-v. Else, skip. It is clear that the complexity for constructing such graph is O(n^2), since we must check each possible pairwise combination.

Once we have a constructed independence graph we must now discriminate each complete subgraph in the independence graph. We can do this by utils.Union-utils.Find.

Initially each vertex has its own set.
For each vertex v:
	For each edge v-u:
		If u is not in the same set of v then
			utils.Union(u, v)
		EndIf
	EndFor
EndFor

After passing through every vertex, we have k connected subgraphs. These k subgraphs are indepedent of each other. Return these k-sets.

func NewIndepGraph

func NewIndepGraph(data []*utils.VarData, pval float64) *Graph

NewIndepGraph constructs a new Graph given a DataGroup.

func NewUFIndepGraph

func NewUFIndepGraph(data []*utils.VarData, pval float64) *Graph

NewUFIndepGraph creates a new Graph using the Union-Find heuristic.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL