rosalind

package
v0.0.0-...-a170c34 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 26, 2019 License: MIT Imports: 11 Imported by: 0

README

rosalind go package

This directory contains the rosalind Go package.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Binomial

func Binomial(n, k int) int

Returns value of Binomial Coefficient Binom(n, k).

func Bitmasks2DNA

func Bitmasks2DNA(bitmasks map[string][]bool) (string, error)

Convert four bitmasks (one each for ATGC) into a DNA string.

func CheckIsDNA

func CheckIsDNA(input string) bool

Given an alleged DNA input string, iterate through it character by character to ensure that it only contains ATGC. Returns true if this is DNA (ATGC only), false otherwise.

func Complement

func Complement(input string) (string, error)

Given a DNA input string, find the complement. The complement swaps Gs and Cs, and As and Ts.

func ConstructDeBruijnGraphKmers

func ConstructDeBruijnGraphKmers(kmers []string) (map[string][]string, error)

func ConstructDeBruijnGraphString

func ConstructDeBruijnGraphString(text string, k int) (map[string][]string, error)

func CountHammingNeighbors

func CountHammingNeighbors(n, d, c int) (int, error)

Given an input string of DNA of length n, a maximum Hamming distance of d, and a number of codons c, determine the number of Hamming neighbors of distance less than or equal to d using a combinatorics formula.

func CountKmersMismatches

func CountKmersMismatches(input string, k, d int) (int, error)

Count the number of times a given kmer and any Hamming neighbors (distance d or less) occur in the input string.

func CountNucleotides

func CountNucleotides(dna string) (map[string]int, error)

Count the number of each type of nucleotide ACGT.

func CountNucleotidesArray

func CountNucleotidesArray(dna string) ([]int, error)

Count the number of each type of nucleotide ACGT and return as an array in order A, C, G, T.

func DNA2Bitmasks

func DNA2Bitmasks(input string) (map[string][]bool, error)

Convert a DNA string into four bitmasks: one each for ATGC. That is, for the DNA string AATCCGCT, it would become:

bitmask[A] = 11000000 bitmask[T] = 00100001 bitmask[C] = 00011010 bitmask[G] = 00000100

func EqualBoolSlices

func EqualBoolSlices(a, b []bool) bool

Utility function: check if two boolean arrays/array slices are equal. This is necessary because of squirrely behavior when comparing arrays (of type [1]bool) and slices (of type []bool).

func EqualIntSlices

func EqualIntSlices(a, b []int) bool

Check if two int arrays/array slices are equal.

func EqualStringSlices

func EqualStringSlices(a, b []string) bool

Utility function: check if two string arrays/array slices are equal. This is necessary because of squirrely behavior when comparing arrays (of type [1]string) and slices (of type []string).

func Factorial

func Factorial(n int) int

Compute the factorial of an integer.

func FindApproximateOccurrences

func FindApproximateOccurrences(pattern, text string, d int) ([]int, error)

Given a large string (text) and a string (pattern), find the zero-based indices where we have an occurrence of pattern or a string with Hamming distance d or less from pattern.

func FindClumps

func FindClumps(genome string, k, L, t int) ([]string, error)

Find k-mers (patterns) of length k occuring at least t times over an interval of length L in a genome.

func FindMotifs

func FindMotifs(dna []string, k, d int) ([]string, error)

Given a collection of strings Dna and an integer d, a k-mer is a (k,d)-motif if it appears in every string from Dna with at most d mismatches.

func FindOccurrences

func FindOccurrences(pattern, genome string) ([]int, error)

Given a large string (genome) and a string (pattern), find the zero-based indices where pattern occurs in genome.

func FrequencyArray

func FrequencyArray(input string, k int) ([]int, error)

Generate and return the frequency array for an input string for all kmers of a given length k.

To do this, we assemble the kmer histogram map, then convert that into the frequency array.

func GetSortedKeys

func GetSortedKeys(m map[string][]string) ([]string, error)

Utility method: given a map of string to []string, extract a list of all string keys, sort them, and return the sorted list.

func GibbsSampler

func GibbsSampler(dna []string, k, t, n int) ([]string, int, error)

Implement a Gibbs sampler with pseudocounts. The Gibbs sampler starts with random kmers, and samples kmers randomly generated from a Profile matrix. Better sampling makes the algorithm faster.

func GreedyMotifSearch

func GreedyMotifSearch(dna []string, k, t int, pseudocounts bool) ([]string, error)

Given an integer k (kmer size) and t (len(dna)), return a collection of kmer strings that have the lowest score (highest similarity). If at any step you find more than one Profile-most probable k-mer in a given DNA string, use the one occurring first. Boolean pseudocounts turns on/off pseudocounts.

func GreedyMotifSearchNoPseudocounts

func GreedyMotifSearchNoPseudocounts(dna []string, k, t int) ([]string, error)

Run a greedy motif search using regular counts.

func GreedyMotifSearchPseudocounts

func GreedyMotifSearchPseudocounts(dna []string, k, t int) ([]string, error)

Run a greedy motif search using pseudocounts.

func HammingDistance

func HammingDistance(p, q string) (int, error)

Compute the Hamming distance between two strings. The Hamming distance is defined as the number of characters different between two strings.

func KeySetIntersection

func KeySetIntersection(input []map[string]int) ([]string, error)

Find the intersection of the key sets for a slice of string to integer maps.

func KmerComposition

func KmerComposition(input string, k int) ([]string, error)

Given an input DNA string, generate a set of all k-mers of length k in the input string.

func KmerHistogram

func KmerHistogram(input string, k int) (map[string]int, error)

Return the histogram of kmers of length k found in the given input

func KmerHistogramMismatches

func KmerHistogramMismatches(input string, k, d int) (map[string]int, error)

Return the histogram of all kmers of length k that are in the input, or whose Hamming neighbors within distance d are in the input.

func KmerInOrderList

func KmerInOrderList(dna string, k int) ([]string, error)

Return a list of kmers of length k that occur in a DNA string. This list preserves order in which the kmers appear in DNA. This list does not include duplicates.

func ManyGibbsSamplers

func ManyGibbsSamplers(dna []string, k, t, n, n_starts int) ([]string, error)

Driver function to run multiple random motif searches and keep the best of all runs. n is the number of inner loops in one run of the Gibbs Sampler. n_starts is the number of times the Gibbs Sampler is run.

func ManyRandomMotifSearches

func ManyRandomMotifSearches(dna []string, k, t, n int) ([]string, error)

Driver function to run multiple random motif searches and keep the best of all runs.

func MedianString

func MedianString(dna []string, k int) ([]string, error)

func MinKmerDistance

func MinKmerDistance(pattern, text string) (int, error)

Given a k-mer pattern and a longer string text, find the minimum distance from k-mer pattern to any possible k-mer in text.

func MinKmerDistances

func MinKmerDistances(pattern string, inputs []string) (int, error)

Given a k-mer pattern and a set of strings, find the sum (L1 norm) of the shortest distances from k-mer pattern to each input string.

func MinSkewPositions

func MinSkewPositions(genome string) ([]int, error)

The skew of a genome is the difference between the number of G and C codons that have occurred cumulatively in a given strand of DNA. This function computes the positions in the genome at which the cumulative skew is minimized.

func MoreFrequentThanNKmers

func MoreFrequentThanNKmers(input string, k, N int) ([]string, error)

Find the kmer(s) in the kmer histogram exceeding a count of N, and return as a string array slice

func MostFrequentKmers

func MostFrequentKmers(input string, k int) ([]string, error)

Find the most frequent kmer(s) in the kmer histogram, and return as a string array slice

func MostFrequentKmersMismatches

func MostFrequentKmersMismatches(input string, k, d int) ([]string, error)

Find the most frequent kmer(s) of length k in the given input string. Include mismatches of Hamming distance <= d.

func MostFrequentKmersMismatchesRevComp

func MostFrequentKmersMismatchesRevComp(input string, k, d int) ([]string, error)

Find the most frequent kmer(s) of length k in the given input string and its reverse complement. Include mismatches of Hamming distance <= d.

func NumberToPattern

func NumberToPattern(n, k int) (string, error)

NumberToPattern converts an integer n and a kmer length k into the corresponding kmer string.

NOTE: We should be a little more careful about integer overflow, as that can easily happen for large k.

func OverlapGraph

func OverlapGraph(patterns []string) (map[string][]string, error)

Construct the overlap graph of a collection of kmers. Given: arbitrary collection of kmers. Create: graph having 1 node for each kmer in kmer patterns Connect: kmers Pattern and Pattern' by directed edge if Suffix(Pattern) is equal to Prefix(Pattern') The resulting graph is called the overlap graph on these k-mers, denoted Overlap(Patterns).

Return the overlap graph Overlap(Patterns), in the form of an adjacency list.

func PatternCount

func PatternCount(input string, pattern string) int

Count occurrences of a substring pattern in a string input

func PatternToNumber

func PatternToNumber(input string) (int, error)

PatternToNumber transforms a kmer of a given length into a corresponding integer indicating its lexicographic ordering among all kmers of length k.

A = 0 C = 1 G = 2 T = 3

Example for k = 3: C G T | | | | | T - - > 3 * 4^{k-3} | G - - - > 2 * 4^{k-2} C - - - - > 1 * 4^{k-1}

This basically boils down to transforming a number between base 10 (integer) and base 4 (DNA)

func ProfileMostProbableKmer

func ProfileMostProbableKmer(dna string, k int, profile [][]float32) (string, error)

Only return the _most_ probable kmer.

func ProfileMostProbableKmers

func ProfileMostProbableKmers(dna string, k int, profile [][]float32) ([]string, error)

Given a profile matrix, and given a DNA input string, evaluate the probability of every kmer in the DNA string and find the most probable kmer in the text - the kmer that was most likely to have been generated by profile among all kmers in text.

This particular method does not pay attention to order of occurrence of kmers.

func ProfileMostProbableKmersGreedy

func ProfileMostProbableKmersGreedy(dna string, k int, profile [][]float32) (string, error)

This uses a probility matrix and evaluates all possible kmers in a DNA string to determine which kmers in the DNA string match the profile most closely.

The greedy version maintains the order in which kmers occur in the original DNA string, and stops as soon as the first match is found.

func RandomMotifSearchPseudocounts

func RandomMotifSearchPseudocounts(dna []string, k, t int) ([]string, int, error)

Run a random motif search with pseudocounts.

func ReadLines

func ReadLines(path string) ([]string, error)

ReadLines reads a whole file into memory and returns a slice of its lines.

func ReadMatrix32

func ReadMatrix32(lines []string, k int) ([][]float32, error)

ReadMatrix takes a set of lines containing a multidimensional array of floating point values, k elements per line, n lines, and returns a slice of slices with size slice[k][n] and with type float32.

func ReconstructGenomeFromPath

func ReconstructGenomeFromPath(contigs []string) (string, error)

Given a set of kmers that overlap such that the last k-1 symbols of pattern i equal the first k-1 symbols of pattern i+1 for all i = 1 to n-1, return a string of length k + n - 1 where the ith kmer is equal to pattern i

func ReconstructGenomeFromPath_old

func ReconstructGenomeFromPath_old(contigs []string) (string, error)

Given a genome path, i.e., a set of k-mers that overlap by some unknown number (up to k-1) of characters each, assemble the paths into a single string containing the genome.

Note: This solved a problem that is slightly more general than the problem actually given - here we assume the number of characters overlapping is unknown, but the problem on Rosalind.info says it's always 1.

func ReverseComplement

func ReverseComplement(input string) (string, error)

Given a DNA input string, find the reverse complement. The complement swaps Gs and Cs, and As and Ts. The reverse complement reverses that.

func ReverseString

func ReverseString(s string) string

Reverse returns its argument string reversed rune-wise left to right. https://github.com/golang/example/blob/master/stringutil/reverse.go

func SPrintOverlapGraph

func SPrintOverlapGraph(overlap_graph map[string][]string, one_edge_per_line bool) (string, error)

Print string representation of an overlap graph (map of string to []string) with the form: "SRC -> DEST" (no double quotes, one edge per line) and return the resulting string. The edges are ordered.

func TheseFloatsAreEqual

func TheseFloatsAreEqual(a, b float32) bool

Check if two floats are equal, to within some small tolerance.

func VisitHammingNeighbors

func VisitHammingNeighbors(input string,
	d int) ([]string, error)

Given an input string of DNA, generate variations of said string that are a Hamming distance of less than or equal to d.

func WriteLines

func WriteLines(lines []string, path string) error

WriteLines writes the lines to the given file.

Types

type DirGraph

type DirGraph struct {
	// contains filtered or unexported fields
}

Directed graph type

func (*DirGraph) AddEdge

func (g *DirGraph) AddEdge(n1, n2 *Node)

Add a directed edge

func (*DirGraph) AddNode

func (g *DirGraph) AddNode(n *Node)

Add a node to the directed graph

func (*DirGraph) EdgeCount

func (g *DirGraph) EdgeCount() int

Get a total count of edges in the graph

func (*DirGraph) GetNode

func (g *DirGraph) GetNode(label string) *Node

Get a node, given a label

func (*DirGraph) String

func (g *DirGraph) String() string

Return a sorted edge list representation of the graph

type Node

type Node struct {
	// contains filtered or unexported fields
}

Graph node

func (*Node) String

func (n *Node) String() string

Convert a node to a string

type ScoredMotifMatrix

type ScoredMotifMatrix struct {
	// contains filtered or unexported fields
}

Create a struct to hold a set of motifs (kmers) and their associated score. We continually assemble many of these possible sets of motifs, checking to find a set of motifs with a minimum score. The score is not updated dyanmically, see UpdateScore().

func NewScoredMotifMatrix

func NewScoredMotifMatrix() ScoredMotifMatrix

Constructor

func (*ScoredMotifMatrix) AddMotif

func (s *ScoredMotifMatrix) AddMotif(motif string) error

Add a motif to the motif matrix

func (*ScoredMotifMatrix) MakeProfile

func (s *ScoredMotifMatrix) MakeProfile(pseudocounts bool) ([][]float32, error)

func (*ScoredMotifMatrix) UpdateScore

func (s *ScoredMotifMatrix) UpdateScore() error

Update the value of the score of a ScoredMotifMatrix. This assembles a kmer composed of the most common nucleotide per position, then computes the sum of the Hamming distances from that kmer for all motifs.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL