indep

package

v0.1.0 Latest Latest Go to latest Published: Dec 14, 2018 License: BSD-3-Clause Imports: 4 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/RenatoGeh/gospn

Links

Open Source Insights

Documentation ¶

Overview ¶

Package indep contains independence test algorithms (e.g. G-Test and Pearson's).

Index ¶

func ChiSquare(chi float64, df int) float64
func ChiSquareTest(p, q int, data [][]int, sigval float64) bool
func Chisqr(df int, cv float64) float64
func Chisquare(df int, cv float64) float64
func GTest(p, q int, data [][]int, n int, sigval float64) bool
type Graph
- func NewIndepGraph(data []*utils.VarData, pval float64) *Graph
- func NewUFIndepGraph(data []*utils.VarData, pval float64) *Graph

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func ChiSquare ¶

func ChiSquare(chi float64, df int) float64

ChiSquare returns the cumulative distribution function at point chi, that is:

Pr(X^2 <= chi)

Where X^2 is the chi-square distribution X^2(df), with df being the degree of freedom.

func ChiSquareTest ¶

func ChiSquareTest(p, q int, data [][]int, sigval float64) bool

ChiSquareTest returns whether variable x and y are statistically independent. We use the Chi-Square test to find correlations between the two variables. Argument data is a table with the counting of each variable category, where the first axis is the counting of each category of variable x and the second axis of variable y. The last element of each row and column is the total counting. E.g.:

+------------------------+
|      X_1 X_2 X_3 total |
| Y_1  100 200 100  400  |
| Y_2   50 300  25  375  |
|total 150 500 125  775  |
+------------------------+

Argument p is the number of categories (or levels) in x.

Argument q is the number of categories (or levels) in y.

Returns true if independent and false otherwise.

func Chisqr ¶

func Chisqr(df int, cv float64) float64

Chisqr gives the function Chi-Square.

func Chisquare ¶

func Chisquare(df int, cv float64) float64

Chisquare returns the p-value of Pr(X^2 > cv). Compare this value to the significance level assumed. If chisquare < sigval, then we cannot accept the null hypothesis and thus the two variables are dependent.

Thanks to Jacob F. W. for a tutorial on chi-square distributions. Source: http://www.codeproject.com/Articles/432194/How-to-Calculate-the-Chi-Squared-P-Value

func GTest ¶

func GTest(p, q int, data [][]int, n int, sigval float64) bool

GTest is the G-Test log-likelihood independence test.

Types ¶

type Graph ¶

type Graph struct {

	// This k-set contains the connected subgraphs that are completely separated from each other.
	Kset [][]int
	// contains filtered or unexported fields
}

Graph represents an independence graph.

An independence graph is an undirected graph that maps the (in)dependencies of a set of variable. Let X={X_1,...,X_n} be the set of variables. We define an independence graph as an undirected graph G=(X, E) where there exists an edge between a pair of vertices u,v in X iff there exists a dependency between variables u and v. That is, if two variables are dependent than there exists an edge between them. Otherwise there is no such edge.

The resulting graph after such construction is a graph with clusters of connected graphs. Let H_1 and H_2 be two complete subgraphs in G. Then there exists no edge between any one vertex in H_1 and another in H_2. This constitutes an independence relation between these subgraphs. Thus we say that sets of variables in H_1 are independent of sets of variables in H_2. We now show why this is correct. Consider the following example (it can be extended to the general case easily):

Let X, Y and Z be variables. We will denote the symbol ~ as a dependency relation. That is, X ~ Y means that X is dependent of Y. Consider the case where X ~ Y. Then there exists an edge between X and Y. If Z is independent of both, then Y is disconnected from X-Y. The converse holds, since if there exists no edge between them they are independent. Now consider X ~ Y and Y ~ Z. Since X-Y, Y-Z and therefore the graph is connected. The last case is when everyone is independent of everyone, in which case there are no edges and all variables are disconnected. We can assume X, Y and Z as sets of variables for the general case.

To construct the graph, we can check for dependencies on each distinct pair of variables (u,v) of set X. If there exists a dependency, add an edge u-v. Else, skip. It is clear that the complexity for constructing such graph is O(n^2), since we must check each possible pairwise combination.

Once we have a constructed independence graph we must now discriminate each complete subgraph in the independence graph. We can do this by utils.Union-utils.Find.

Initially each vertex has its own set.
For each vertex v:
	For each edge v-u:
		If u is not in the same set of v then
			utils.Union(u, v)
		EndIf
	EndFor
EndFor

After passing through every vertex, we have k connected subgraphs. These k subgraphs are indepedent of each other. Return these k-sets.

func NewIndepGraph ¶

func NewIndepGraph(data []*utils.VarData, pval float64) *Graph

NewIndepGraph constructs a new Graph given a DataGroup.

func NewUFIndepGraph ¶

func NewUFIndepGraph(data []*utils.VarData, pval float64) *Graph

NewUFIndepGraph creates a new Graph using the Union-Find heuristic.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL