align

package
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 1, 2024 License: BSD-3-Clause Imports: 13 Imported by: 1

Documentation

Overview

Package align provides basic struct(s), gap scoring matrices, and dynamical approaches to aligning sequences

Index

Constants

This section is empty.

Variables

View Source
var DefaultScoreMatrix = [][]int64{
	{91, -114, -31, -123, -44},
	{-114, 100, -125, -31, -43},
	{-31, -125, 100, -114, -43},
	{-123, -31, -114, 91, -44},
	{-44, -43, -43, -44, -43},
}

DefaultScoreMatrix is a DNA-DNA scoring matrix that pairs well with opening and extension penalites of: O=400 E=30. It may be good for distances similar to human-mouse.

View Source
var HoxD55ScoreMatrix = [][]int64{
	{91, -114, -31, -123, 0},
	{-114, 100, -125, -31, 0},
	{-31, -125, 100, -114, 0},
	{-123, -31, -114, 91, 0},
	{0, 0, 0, 0, 0},
}

DefaultScoreMatrix is a DNA-DNA scoring matrix that pairs well with opening and extension penalites of: O=400 E=30. It may be good for distances similar to human-fish.

View Source
var HumanChimpTwoScoreMatrix = [][]int64{
	{90, -330, -236, -356, -208},
	{-330, 100, -318, -236, -196},
	{-236, -318, 100, -330, -196},
	{-356, -236, -330, 90, -208},
	{-208, -196, -196, -208, -202},
}

DefaultScoreMatrix is a DNA-DNA scoring matrix that pairs well with opening and extension penalites of: O=600 E=150. It may be good for distances similar to human-chimp.

View Source
var MouseRatScoreMatrix = [][]int64{
	{91, -114, -31, -123, 0},
	{-114, 100, -125, -31, 0},
	{-31, -125, 100, -114, 0},
	{-123, -31, -114, 91, 0},
	{0, 0, 0, 0, 0},
}

DefaultScoreMatrix is a DNA-DNA scoring matrix that pairs well with opening and extension penalites of: O=600 E=55. It may be good for distances similar to mouse-rat.

Functions

func AllSeqAffine

func AllSeqAffine(records []fasta.Fasta, scoreMatrix [][]int64, gapOpen int64, gapExtend int64) []fasta.Fasta

AllSeqAffine performs a multiple alignment of all fasta sequences according to the score matrix and gap penalties. The alignment is returned as a multi-fasta.

func AllSeqAffineChunk

func AllSeqAffineChunk(records []fasta.Fasta, scoreMatrix [][]int64, gapOpen int64, gapExtend int64, chunkSize int) []fasta.Fasta

AllSeqAffineChunk is similar to AllSeqAffine, but aligns the sequences in chunks of chunkSize bases. This was used when aligning tandem repeats with repeat units of length chunkSize.

func DrawAlignedChunks

func DrawAlignedChunks(aln []fasta.Fasta, chunkSize int, chunkPixelWidth int, chunkPixelHeight int) (*image.RGBA, error)

DrawAlignedChunks takes a multiple alignment in fasta format, a chunkSize, a width of pixels to use for coloring each alignment chunk, a height of each chunk, and then returns an image describing the alignment where each unique chunk gets a unique color. This was used to visualize the alignment of tandem repeats.

func GoAffineGapLocalEngine

func GoAffineGapLocalEngine(scores [][]int64, gapOpen int64, gapExtend int64) (inputs chan<- TargetQueryPair, outputs <-chan TargetQueryPair)

GoAffineGapLocalEngine returns input and output channels for TargetQueryPairs and launches a go routine to locally align any sequences coming from the input channel and send the results to the output channel. This alignment is performed according to the score matrix and opening, extension penalities.

func LocalView

func LocalView(alpha []dna.Base, beta []dna.Base, operations []Cigar, maxI int64) string

LocalView returns a human-readable local alignment of two DNA sequences (alpha, beta) given the cigar of that alignment and the last aligning base position in alpha.

func PrintCigar

func PrintCigar(operations []Cigar) string

PrintCigar returns the slice of cigar operations as a human-readable string

func View

func View(alpha []dna.Base, beta []dna.Base, operations []Cigar) string

View takes two sequences and a cigar describing their alignment and returns a human-readable alignment of the two sequences.

Types

type Cigar

type Cigar struct {
	RunLength int64
	Op        ColType
}

Cigar is a runlength encoding of how two sequences align to each other.

func AffineGap

func AffineGap(alpha []dna.Base, beta []dna.Base, scores [][]int64, gapOpen int64, gapExtend int64) (int64, []Cigar)

AffineGap aligns two DNA sequences (alpha and beta), using a score matrix (i.e. scores), along with a gap opening and gap extension penalities. The alignment score and a cigar describing the alignment are the return values. This version of AffineGap has a fixed checkersize of 10000*10000.

func AffineGapChunk

func AffineGapChunk(alpha []dna.Base, beta []dna.Base, scores [][]int64, gapOpen int64, gapExtend int64, chunkSize int64) (int64, []Cigar)

AffineGapChunk is similar to AffineGap, but rather than aligning individual bases, it aligns them in chunks of chunkSize bases. This was used to align tandem repeats against each other, where a repeating unit is of chunkSize.

func AffineGapLocal

func AffineGapLocal(target []dna.Base, query []dna.Base, scores [][]int64, gapOpen int64, gapExtend int64) (int64, []Cigar)

AffineGapLocal functions identically to AffineGap_highMem, but it does not penalize for gaps placed at the beginning or end of the alignment. This property enables AffineGap to be used for local alignment, such as aligning a 150bp sequencing read to a 1kb reference sequence.

func AffineGap_customizeCheckersize

func AffineGap_customizeCheckersize(alpha []dna.Base, beta []dna.Base, scores [][]int64, gapOpen int64, gapExtend int64, checkersize_i int, checkersize_j int) (int64, []Cigar)

AffineGap_customizeCheckersize aligns two DNA sequences (alpha and beta), using a score matrix (i.e. scores), along with a gap opening and gap extension penalities. The alignment score and a cigar describing the alignment are the return values. This version of AffineGap needs additional inputs and allows customization of checkersize_i and checkersize_j.

func AffineGap_highMem

func AffineGap_highMem(alpha []dna.Base, beta []dna.Base, scores [][]int64, gapOpen int64, gapExtend int64) (int64, []Cigar)

AffineGap_highMem aligns two DNA sequences (alpha and beta) using a score matrix, a gap opening penalty, and a gap extension penality. The return values are the alignment score and a cigar describing the alignment.

func ConstGap

func ConstGap(alpha []dna.Base, beta []dna.Base, scores [][]int64, gapPen int64) (int64, []Cigar)

ConstGap aligns two sequences (alpha, beta) using a score matrix (scores) and a constant gap penalty of gapPen. The return values are the alignment score and the cigar representing the alignment. This version of ConstGap has a fixed checkersize of 10000*10000.

func ConstGap_customizeCheckersize

func ConstGap_customizeCheckersize(alpha []dna.Base, beta []dna.Base, scores [][]int64, gapPen int64, checkersize_i int, checkersize_j int) (int64, []Cigar)

ConstGap_customizeCheckersize aligns two sequences (alpha, beta) using a score matrix (scores) and a constant gap penalty of gapPen. The return values are the alignment score and the cigar representing the alignment. This version of ConstGap needs additional inputs and allows customization of checkersize_i and checkersize_j.

func ConstGap_highMem

func ConstGap_highMem(alpha []dna.Base, beta []dna.Base, scores [][]int64, gapPen int64) (int64, []Cigar)

ConstGap_highMem aligns two sequences (alpha, beta) using a score matrix (scores) and a constant gap penalty of gapPen. The return values are the alignment score and the cigar representing the alignment.

type ColType

type ColType uint8

these are relative to the first seq. e.g. ColI is an insertion in the second seq, relative to the first.

const (
	ColM ColType = 0
	ColI ColType = 1
	ColD ColType = 2
)

type TargetQueryPair

type TargetQueryPair struct {
	Target []dna.Base
	Query  []dna.Base
	Score  int64
	Cigar  []Cigar
}

TargetQueryPair holds two sequences, their alignment score, and the cigar describing their alignment

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL