rosalind

package

v0.0.0-...-a170c34 Latest Latest Go to latest Published: Apr 26, 2019 License: MIT Imports: 11 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/charlesreid1/go-rosalind

Links

Open Source Insights

README ¶

rosalind go package

This directory contains the rosalind Go package.

Documentation ¶

Index ¶

func Binomial(n, k int) int
func Bitmasks2DNA(bitmasks map[string][]bool) (string, error)
func CheckIsDNA(input string) bool
func Complement(input string) (string, error)
func ConstructDeBruijnGraphKmers(kmers []string) (map[string][]string, error)
func ConstructDeBruijnGraphString(text string, k int) (map[string][]string, error)
func CountHammingNeighbors(n, d, c int) (int, error)
func CountKmersMismatches(input string, k, d int) (int, error)
func CountNucleotides(dna string) (map[string]int, error)
func CountNucleotidesArray(dna string) ([]int, error)
func DNA2Bitmasks(input string) (map[string][]bool, error)
func EqualBoolSlices(a, b []bool) bool
func EqualIntSlices(a, b []int) bool
func EqualStringSlices(a, b []string) bool
func Factorial(n int) int
func FindApproximateOccurrences(pattern, text string, d int) ([]int, error)
func FindClumps(genome string, k, L, t int) ([]string, error)
func FindMotifs(dna []string, k, d int) ([]string, error)
func FindOccurrences(pattern, genome string) ([]int, error)
func FrequencyArray(input string, k int) ([]int, error)
func GetSortedKeys(m map[string][]string) ([]string, error)
func GibbsSampler(dna []string, k, t, n int) ([]string, int, error)
func GreedyMotifSearch(dna []string, k, t int, pseudocounts bool) ([]string, error)
func GreedyMotifSearchNoPseudocounts(dna []string, k, t int) ([]string, error)
func GreedyMotifSearchPseudocounts(dna []string, k, t int) ([]string, error)
func HammingDistance(p, q string) (int, error)
func KeySetIntersection(input []map[string]int) ([]string, error)
func KmerComposition(input string, k int) ([]string, error)
func KmerHistogram(input string, k int) (map[string]int, error)
func KmerHistogramMismatches(input string, k, d int) (map[string]int, error)
func KmerInOrderList(dna string, k int) ([]string, error)
func ManyGibbsSamplers(dna []string, k, t, n, n_starts int) ([]string, error)
func ManyRandomMotifSearches(dna []string, k, t, n int) ([]string, error)
func MedianString(dna []string, k int) ([]string, error)
func MinKmerDistance(pattern, text string) (int, error)
func MinKmerDistances(pattern string, inputs []string) (int, error)
func MinSkewPositions(genome string) ([]int, error)
func MoreFrequentThanNKmers(input string, k, N int) ([]string, error)
func MostFrequentKmers(input string, k int) ([]string, error)
func MostFrequentKmersMismatches(input string, k, d int) ([]string, error)
func MostFrequentKmersMismatchesRevComp(input string, k, d int) ([]string, error)
func NumberToPattern(n, k int) (string, error)
func OverlapGraph(patterns []string) (map[string][]string, error)
func PatternCount(input string, pattern string) int
func PatternToNumber(input string) (int, error)
func ProfileMostProbableKmer(dna string, k int, profile [][]float32) (string, error)
func ProfileMostProbableKmers(dna string, k int, profile [][]float32) ([]string, error)
func ProfileMostProbableKmersGreedy(dna string, k int, profile [][]float32) (string, error)
func RandomMotifSearchPseudocounts(dna []string, k, t int) ([]string, int, error)
func ReadLines(path string) ([]string, error)
func ReadMatrix32(lines []string, k int) ([][]float32, error)
func ReconstructGenomeFromPath(contigs []string) (string, error)
func ReconstructGenomeFromPath_old(contigs []string) (string, error)
func ReverseComplement(input string) (string, error)
func ReverseString(s string) string
func SPrintOverlapGraph(overlap_graph map[string][]string, one_edge_per_line bool) (string, error)
func TheseFloatsAreEqual(a, b float32) bool
func VisitHammingNeighbors(input string, d int) ([]string, error)
func WriteLines(lines []string, path string) error
type DirGraph
- func (g *DirGraph) AddEdge(n1, n2 *Node)
- func (g *DirGraph) AddNode(n *Node)
- func (g *DirGraph) EdgeCount() int
- func (g *DirGraph) GetNode(label string) *Node
- func (g *DirGraph) String() string
type Node
- func (n *Node) String() string
type ScoredMotifMatrix
- func NewScoredMotifMatrix() ScoredMotifMatrix
- func (s *ScoredMotifMatrix) AddMotif(motif string) error
- func (s *ScoredMotifMatrix) MakeProfile(pseudocounts bool) ([][]float32, error)
- func (s *ScoredMotifMatrix) UpdateScore() error

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Binomial ¶

func Binomial(n, k int) int

Returns value of Binomial Coefficient Binom(n, k).

func Bitmasks2DNA ¶

func Bitmasks2DNA(bitmasks map[string][]bool) (string, error)

Convert four bitmasks (one each for ATGC) into a DNA string.

func CheckIsDNA ¶

func CheckIsDNA(input string) bool

Given an alleged DNA input string, iterate through it character by character to ensure that it only contains ATGC. Returns true if this is DNA (ATGC only), false otherwise.

func Complement ¶

func Complement(input string) (string, error)

Given a DNA input string, find the complement. The complement swaps Gs and Cs, and As and Ts.

func ConstructDeBruijnGraphKmers ¶

func ConstructDeBruijnGraphKmers(kmers []string) (map[string][]string, error)

func ConstructDeBruijnGraphString ¶

func ConstructDeBruijnGraphString(text string, k int) (map[string][]string, error)

func CountHammingNeighbors ¶

func CountHammingNeighbors(n, d, c int) (int, error)

Given an input string of DNA of length n, a maximum Hamming distance of d, and a number of codons c, determine the number of Hamming neighbors of distance less than or equal to d using a combinatorics formula.

func CountKmersMismatches ¶

func CountKmersMismatches(input string, k, d int) (int, error)

Count the number of times a given kmer and any Hamming neighbors (distance d or less) occur in the input string.

func CountNucleotides ¶

func CountNucleotides(dna string) (map[string]int, error)

Count the number of each type of nucleotide ACGT.

func CountNucleotidesArray ¶

func CountNucleotidesArray(dna string) ([]int, error)

Count the number of each type of nucleotide ACGT and return as an array in order A, C, G, T.

func DNA2Bitmasks ¶

func DNA2Bitmasks(input string) (map[string][]bool, error)

Convert a DNA string into four bitmasks: one each for ATGC. That is, for the DNA string AATCCGCT, it would become:

bitmask[A] = 11000000 bitmask[T] = 00100001 bitmask[C] = 00011010 bitmask[G] = 00000100

func EqualBoolSlices ¶

func EqualBoolSlices(a, b []bool) bool

Utility function: check if two boolean arrays/array slices are equal. This is necessary because of squirrely behavior when comparing arrays (of type [1]bool) and slices (of type []bool).

func EqualIntSlices ¶

func EqualIntSlices(a, b []int) bool

Check if two int arrays/array slices are equal.

func EqualStringSlices ¶

func EqualStringSlices(a, b []string) bool

Utility function: check if two string arrays/array slices are equal. This is necessary because of squirrely behavior when comparing arrays (of type [1]string) and slices (of type []string).

func Factorial ¶

func Factorial(n int) int

Compute the factorial of an integer.

func FindApproximateOccurrences ¶

func FindApproximateOccurrences(pattern, text string, d int) ([]int, error)

Given a large string (text) and a string (pattern), find the zero-based indices where we have an occurrence of pattern or a string with Hamming distance d or less from pattern.

func FindClumps ¶

func FindClumps(genome string, k, L, t int) ([]string, error)

Find k-mers (patterns) of length k occuring at least t times over an interval of length L in a genome.

func FindMotifs ¶

func FindMotifs(dna []string, k, d int) ([]string, error)

Given a collection of strings Dna and an integer d, a k-mer is a (k,d)-motif if it appears in every string from Dna with at most d mismatches.

func FindOccurrences ¶

func FindOccurrences(pattern, genome string) ([]int, error)

Given a large string (genome) and a string (pattern), find the zero-based indices where pattern occurs in genome.

func FrequencyArray ¶

func FrequencyArray(input string, k int) ([]int, error)

Generate and return the frequency array for an input string for all kmers of a given length k.

To do this, we assemble the kmer histogram map, then convert that into the frequency array.

func GetSortedKeys ¶

func GetSortedKeys(m map[string][]string) ([]string, error)

Utility method: given a map of string to []string, extract a list of all string keys, sort them, and return the sorted list.

func GibbsSampler ¶

func GibbsSampler(dna []string, k, t, n int) ([]string, int, error)

Implement a Gibbs sampler with pseudocounts. The Gibbs sampler starts with random kmers, and samples kmers randomly generated from a Profile matrix. Better sampling makes the algorithm faster.

func GreedyMotifSearch ¶

func GreedyMotifSearch(dna []string, k, t int, pseudocounts bool) ([]string, error)

Given an integer k (kmer size) and t (len(dna)), return a collection of kmer strings that have the lowest score (highest similarity). If at any step you find more than one Profile-most probable k-mer in a given DNA string, use the one occurring first. Boolean pseudocounts turns on/off pseudocounts.

func GreedyMotifSearchNoPseudocounts ¶

func GreedyMotifSearchNoPseudocounts(dna []string, k, t int) ([]string, error)

Run a greedy motif search using regular counts.

func GreedyMotifSearchPseudocounts ¶

func GreedyMotifSearchPseudocounts(dna []string, k, t int) ([]string, error)

Run a greedy motif search using pseudocounts.

func HammingDistance ¶

func HammingDistance(p, q string) (int, error)

Compute the Hamming distance between two strings. The Hamming distance is defined as the number of characters different between two strings.

func KeySetIntersection ¶

func KeySetIntersection(input []map[string]int) ([]string, error)

Find the intersection of the key sets for a slice of string to integer maps.

func KmerComposition ¶

func KmerComposition(input string, k int) ([]string, error)

Given an input DNA string, generate a set of all k-mers of length k in the input string.

func KmerHistogram ¶

func KmerHistogram(input string, k int) (map[string]int, error)

Return the histogram of kmers of length k found in the given input

func KmerHistogramMismatches ¶

func KmerHistogramMismatches(input string, k, d int) (map[string]int, error)

Return the histogram of all kmers of length k that are in the input, or whose Hamming neighbors within distance d are in the input.

func KmerInOrderList ¶

func KmerInOrderList(dna string, k int) ([]string, error)

Return a list of kmers of length k that occur in a DNA string. This list preserves order in which the kmers appear in DNA. This list does not include duplicates.

func ManyGibbsSamplers ¶

func ManyGibbsSamplers(dna []string, k, t, n, n_starts int) ([]string, error)

Driver function to run multiple random motif searches and keep the best of all runs. n is the number of inner loops in one run of the Gibbs Sampler. n_starts is the number of times the Gibbs Sampler is run.

func ManyRandomMotifSearches ¶

func ManyRandomMotifSearches(dna []string, k, t, n int) ([]string, error)

Driver function to run multiple random motif searches and keep the best of all runs.

func MedianString ¶

func MedianString(dna []string, k int) ([]string, error)

func MinKmerDistance ¶

func MinKmerDistance(pattern, text string) (int, error)

Given a k-mer pattern and a longer string text, find the minimum distance from k-mer pattern to any possible k-mer in text.

func MinKmerDistances ¶

func MinKmerDistances(pattern string, inputs []string) (int, error)

Given a k-mer pattern and a set of strings, find the sum (L1 norm) of the shortest distances from k-mer pattern to each input string.

func MinSkewPositions ¶

func MinSkewPositions(genome string) ([]int, error)

The skew of a genome is the difference between the number of G and C codons that have occurred cumulatively in a given strand of DNA. This function computes the positions in the genome at which the cumulative skew is minimized.

func MoreFrequentThanNKmers ¶

func MoreFrequentThanNKmers(input string, k, N int) ([]string, error)

Find the kmer(s) in the kmer histogram exceeding a count of N, and return as a string array slice

func MostFrequentKmers ¶

func MostFrequentKmers(input string, k int) ([]string, error)

Find the most frequent kmer(s) in the kmer histogram, and return as a string array slice

func MostFrequentKmersMismatches ¶

func MostFrequentKmersMismatches(input string, k, d int) ([]string, error)

Find the most frequent kmer(s) of length k in the given input string. Include mismatches of Hamming distance <= d.

func MostFrequentKmersMismatchesRevComp ¶

func MostFrequentKmersMismatchesRevComp(input string, k, d int) ([]string, error)

Find the most frequent kmer(s) of length k in the given input string and its reverse complement. Include mismatches of Hamming distance <= d.

func NumberToPattern ¶

func NumberToPattern(n, k int) (string, error)

NumberToPattern converts an integer n and a kmer length k into the corresponding kmer string.

NOTE: We should be a little more careful about integer overflow, as that can easily happen for large k.

func OverlapGraph ¶

func OverlapGraph(patterns []string) (map[string][]string, error)

Construct the overlap graph of a collection of kmers. Given: arbitrary collection of kmers. Create: graph having 1 node for each kmer in kmer patterns Connect: kmers Pattern and Pattern' by directed edge if Suffix(Pattern) is equal to Prefix(Pattern') The resulting graph is called the overlap graph on these k-mers, denoted Overlap(Patterns).

Return the overlap graph Overlap(Patterns), in the form of an adjacency list.

func PatternCount ¶

func PatternCount(input string, pattern string) int

Count occurrences of a substring pattern in a string input

func PatternToNumber ¶

func PatternToNumber(input string) (int, error)

PatternToNumber transforms a kmer of a given length into a corresponding integer indicating its lexicographic ordering among all kmers of length k.

A = 0 C = 1 G = 2 T = 3

Example for k = 3: C G T | | | | | T - - > 3 * 4^{k-3} | G - - - > 2 * 4^{k-2} C - - - - > 1 * 4^{k-1}

This basically boils down to transforming a number between base 10 (integer) and base 4 (DNA)

func ProfileMostProbableKmer ¶

func ProfileMostProbableKmer(dna string, k int, profile [][]float32) (string, error)

Only return the _most_ probable kmer.

func ProfileMostProbableKmers ¶

func ProfileMostProbableKmers(dna string, k int, profile [][]float32) ([]string, error)

Given a profile matrix, and given a DNA input string, evaluate the probability of every kmer in the DNA string and find the most probable kmer in the text - the kmer that was most likely to have been generated by profile among all kmers in text.

This particular method does not pay attention to order of occurrence of kmers.

func ProfileMostProbableKmersGreedy ¶

func ProfileMostProbableKmersGreedy(dna string, k int, profile [][]float32) (string, error)

This uses a probility matrix and evaluates all possible kmers in a DNA string to determine which kmers in the DNA string match the profile most closely.

The greedy version maintains the order in which kmers occur in the original DNA string, and stops as soon as the first match is found.

func RandomMotifSearchPseudocounts ¶

func RandomMotifSearchPseudocounts(dna []string, k, t int) ([]string, int, error)

Run a random motif search with pseudocounts.

func ReadLines ¶

func ReadLines(path string) ([]string, error)

ReadLines reads a whole file into memory and returns a slice of its lines.

func ReadMatrix32 ¶

func ReadMatrix32(lines []string, k int) ([][]float32, error)

ReadMatrix takes a set of lines containing a multidimensional array of floating point values, k elements per line, n lines, and returns a slice of slices with size slice[k][n] and with type float32.

func ReconstructGenomeFromPath ¶

func ReconstructGenomeFromPath(contigs []string) (string, error)

Given a set of kmers that overlap such that the last k-1 symbols of pattern i equal the first k-1 symbols of pattern i+1 for all i = 1 to n-1, return a string of length k + n - 1 where the ith kmer is equal to pattern i

func ReconstructGenomeFromPath_old ¶

func ReconstructGenomeFromPath_old(contigs []string) (string, error)

Given a genome path, i.e., a set of k-mers that overlap by some unknown number (up to k-1) of characters each, assemble the paths into a single string containing the genome.

Note: This solved a problem that is slightly more general than the problem actually given - here we assume the number of characters overlapping is unknown, but the problem on Rosalind.info says it's always 1.

func ReverseComplement ¶

func ReverseComplement(input string) (string, error)

Given a DNA input string, find the reverse complement. The complement swaps Gs and Cs, and As and Ts. The reverse complement reverses that.

func ReverseString ¶

func ReverseString(s string) string

Reverse returns its argument string reversed rune-wise left to right. https://github.com/golang/example/blob/master/stringutil/reverse.go

func SPrintOverlapGraph ¶

func SPrintOverlapGraph(overlap_graph map[string][]string, one_edge_per_line bool) (string, error)

Print string representation of an overlap graph (map of string to []string) with the form: "SRC -> DEST" (no double quotes, one edge per line) and return the resulting string. The edges are ordered.

func TheseFloatsAreEqual ¶

func TheseFloatsAreEqual(a, b float32) bool

Check if two floats are equal, to within some small tolerance.

func VisitHammingNeighbors ¶

func VisitHammingNeighbors(input string,
	d int) ([]string, error)

Given an input string of DNA, generate variations of said string that are a Hamming distance of less than or equal to d.

func WriteLines ¶

func WriteLines(lines []string, path string) error

WriteLines writes the lines to the given file.

Types ¶

type DirGraph ¶

type DirGraph struct {
	// contains filtered or unexported fields
}

Directed graph type

func (*DirGraph) AddEdge ¶

func (g *DirGraph) AddEdge(n1, n2 *Node)

Add a directed edge

func (*DirGraph) AddNode ¶

func (g *DirGraph) AddNode(n *Node)

Add a node to the directed graph

func (*DirGraph) EdgeCount ¶

func (g *DirGraph) EdgeCount() int

Get a total count of edges in the graph

func (*DirGraph) GetNode ¶

func (g *DirGraph) GetNode(label string) *Node

Get a node, given a label

func (*DirGraph) String ¶

func (g *DirGraph) String() string

Return a sorted edge list representation of the graph

type Node ¶

type Node struct {
	// contains filtered or unexported fields
}

Graph node

func (*Node) String ¶

func (n *Node) String() string

Convert a node to a string

type ScoredMotifMatrix ¶

type ScoredMotifMatrix struct {
	// contains filtered or unexported fields
}

Create a struct to hold a set of motifs (kmers) and their associated score. We continually assemble many of these possible sets of motifs, checking to find a set of motifs with a minimum score. The score is not updated dyanmically, see UpdateScore().

func NewScoredMotifMatrix ¶

func NewScoredMotifMatrix() ScoredMotifMatrix

Constructor

func (*ScoredMotifMatrix) AddMotif ¶

func (s *ScoredMotifMatrix) AddMotif(motif string) error

Add a motif to the motif matrix

func (*ScoredMotifMatrix) MakeProfile ¶

func (s *ScoredMotifMatrix) MakeProfile(pseudocounts bool) ([][]float32, error)

func (*ScoredMotifMatrix) UpdateScore ¶

func (s *ScoredMotifMatrix) UpdateScore() error

Update the value of the score of a ScoredMotifMatrix. This assembles a kmer composed of the most common nucleotide per position, then computes the sum of the Hamming distances from that kmer for all motifs.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL