smeargol

command
v0.0.0-...-4cb3568 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 13, 2021 License: BSD-3-Clause Imports: 32 Imported by: 0

Documentation

Overview

smeargol distributes count data across the Gene Ontology DAG provided and prints the GO terms, their roots and depths and distributed counts in a tsv table to stdout. It logs gene identifiers that do not have GO term annotations to stderr. The graph analysis assumes Ensembl gene identifiers and Gene Ontology graph structure.

The input counts file is a tab-delimited file with the first column being Ensembl gene ID (ENSG00000000000) and remaining columns being count data. The first row is expected to be labelled with the first column being Geneid and the remaining columns holding the names of the samples.

The Gene Ontology is required to be in Owl format. The file can be obtained from http://current.geneontology.org/ontology/go.owl.

The ENSG to GO mapping is expected to be in RDF N-Triples or N-Quads in the form:

<obo:GO_0000000> <local:annotates> <ensembl:ENSG00000000000> .

for each GO term to Ensembl gene annotation.

All input files are expected to be gzip compressed and the output is written uncompressed to a matrix and a plot directory. A summary document is written to the specified out file in JSON format corresponding to the following Go structs.

type SummaryDoc struct {
	// Roots is the set of roots in the Gene Ontology.
	Roots []string

	// Summaries contains the summaries of a smeargol
	// analysis.
	Summaries [][]*Summary
}

type Summary struct {
	// Name is the name of the sample.
	Name string

	// Root is the root GO term for the summary.
	Root string

	// Depth is the distance from the root.
	Depth int

	// Rows and Cols are the dimensions of the matrix
	// describing the GO level. Rows corresponds to the
	// number of genes and Cols corresponds to the number
	// of GO terms in the level.
	Rows, Cols int

	// OptimalRank and FractionalRank are the calculated
	// ranks of the summary matrix. OptimalRank is
	// calculated according to the method of Matan Gavish
	// and David L. Donoho https://arxiv.org/abs/1305.5870.
	// FractionalRank is the rank calculated using the
	// user-provided fraction parameters.
	OptimalRank, FractionalRank int

	// Sigma is the complete set of singular values.
	Sigma []float64
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL