pkg

package
v0.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 7, 2020 License: MIT Imports: 10 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Build2dSlice

func Build2dSlice(rows int, cols int) [][]string

Build2dSlice builds a 2d slice of float64 of target size

func ChunkGenome

func ChunkGenome(records <-chan fastx.Record, winSize int, winStride int, chunkSize int) <-chan Chunk

ChunkGenome receives fastx.Record from a channel and produces a channel of record chunks. Each chunk contains multiple windows. The chunkSize is given in number of windows the windows size and stride are in basepair.

func ConsumeChunks

func ConsumeChunks(chunks <-chan Chunk, metrics []string, refProfile map[int]KmerProfile) chan ChunkResult

ConsumeChunks computes window-based statistics in chunks and stores them in a ChunkResult struct.

func MakeRange

func MakeRange(start, end, step int) []int

MakeRange generates a slice of ints from start to end, where each value is spaced by step

func MinInt

func MinInt(x, y int) int

MinInt returns the smallest of two integers. If both are equal, the second input is returned.

func SeqATSkew

func SeqATSkew(seq *seq.Seq) float64

SeqATSkew computes the AT skew of a DNA sequence.

func SeqEntropy

func SeqEntropy(seq *seq.Seq) float64

SeqEntropy computes the Shannon entropy (information entropy) of a DNA sequence.

func SeqGC

func SeqGC(seq *seq.Seq) float64

SeqGC computes the fraction of G or C bases in a sequence (GC content).

func SeqGCSkew

func SeqGCSkew(seq *seq.Seq) float64

SeqGCSkew computes the GC skew of a DNA sequence.

func SeqKmerDiv

func SeqKmerDiv(seq *seq.Seq, ref KmerProfile) float64

SeqKmerDiv will compute the Kmer profile of the input profile and compute its distance to a reference k-mer profile.

func StreamGenome

func StreamGenome(fasta string, bufSize int) <-chan fastx.Record

StreamGenome reads records one by one from an input fasta file and sends them to a channel for downstream processing

Types

type Chunk

type Chunk struct {
	ID      []byte
	BpStart int
	BpEnd   int

	Seq    *seq.Seq
	Starts []int
	// contains filtered or unexported fields
}

Chunk is a piece of fastx.Record sequence, containing the associated id genomic coordinate. It also contains indices of sliding windows in which statistics will be computed.

type ChunkResult

type ChunkResult struct {
	Header []string
	Data   [][]string
}

ChunkResult stores the computed statistics of windows from a chunk in the form of a table. Each row is a window, each column is a feature.

type KmerProfile

type KmerProfile struct {
	K       int
	Profile map[string]float64
}

KmerProfile stores kmers and their frequencies for a given kmer length

func FastaToKmers

func FastaToKmers(fasta string, k int) KmerProfile

FastaToKmers reads all records in a fasta file and computes its k-mer profile

func NewKmerProfile

func NewKmerProfile(k int) KmerProfile

NewKmerProfile is a helper function to generate an empty Kmer profile

func (*KmerProfile) CountsToFreqs

func (p *KmerProfile) CountsToFreqs()

CountsToFreqs transforms counts in a KmerProfile into frequencies.

func (*KmerProfile) GetSeqKmers

func (p *KmerProfile) GetSeqKmers(seq *seq.Seq)

GetSeqKmers compute the k-mer profile of a sequence and increments counts in the KmerProfile accordingly

func (*KmerProfile) KmerDist

func (p *KmerProfile) KmerDist(ref KmerProfile) float64

KmerDist computes the euclidean distance between a reference k-mer profile and another profile. The reference is assumed to include all k-mers present in the profile.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL