dnaThreeBit

package
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 1, 2024 License: BSD-3-Clause Imports: 5 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BasesToUint64

func BasesToUint64(seq []dna.Base, start int, end int, padding ThreeBitBase) uint64

BasesToUint64 will take a section of seq from start to end (left-closed, right open) that is not more than 21 bases and return the uint64 encoding of it that would be used in a ThreeBit. padding can be either PaddingOne or PaddingTwo. If sequences will be compared to each other, they should have different padding values.

func Cat

func Cat(a *ThreeBit, b *ThreeBit)

Cat appends "b" to "a." "a" is changed to be both of them, and "b" is unchanged. It is quickest to have "a" be the longer sequence.

func GetBase

func GetBase(fragment *ThreeBit, pos int) dna.Base

GetBase returns a value equivalent to a single base within the ThreeBit at position, pos.

func RangeToDnaBases

func RangeToDnaBases(fragment *ThreeBit, start int, end int) []dna.Base

RangeToDnaBases returns a slice of dna.Base that represents the bases from start to end (left-closed, right-open) of fragment.

func ThreeBitBaseToRune

func ThreeBitBaseToRune(base ThreeBitBase) rune

ThreeBitBaseToRune returns a rune that corresponds to the single base in ThreeBitBase format.

func ThreeBitBaseToString

func ThreeBitBaseToString(b ThreeBitBase) string

ThreeBitBaseToString returns a string that corresponds to the single base give as a ThreeBitBase.

func ToDnaBases

func ToDnaBases(fragment *ThreeBit) []dna.Base

ToDnaBases returns a slice of dna.Base that represents the same sequence of bases present in fragment.

func ToString

func ToString(fragment *ThreeBit) string

ToString returns a string representation of the ThreeBit passed in.

Types

type ThreeBit

type ThreeBit struct {
	Seq []uint64
	Len int
}

ThreeBit is a struct to represent long DNA sequences in a memory efficient format Seq holds the encoded DNA sequence. The left-most three bits of Seq[0] hold the first base. The first base is *not* encoded in the three least significant bits. Because 64 is not divisible by three, the right-most (least significant) bit is not used in the encoding scheme. Len is the length of the DNA sequence, not the length of the Seq slice.

func Append

func Append(fragment *ThreeBit, b ThreeBitBase) *ThreeBit

Append adds "b" to the end of "fragment." "fragment" can be nil.

func Copy

func Copy(a *ThreeBit) *ThreeBit

Copy returns of duplicate of the ThreeBit passed in.

func FromString

func FromString(s string) *ThreeBit

FromString creates a new ThreeBit from a string of DNA characters {A,C,G,T,N}.

func NewThreeBit

func NewThreeBit(inSeq []dna.Base, padding ThreeBitBase) *ThreeBit

NewThreeBit creates a ThreeBit encoding of inSeq with padding on the end.

func NewThreeBitRainbow

func NewThreeBitRainbow(inSeq []dna.Base, padding ThreeBitBase) []*ThreeBit

NewThreeBitRainbow builds a "rainbow table" of a sequence in ThreeBit format with every possible offset, so that there is always a version of the sequence that can be compared to another ThreeBit sequence using xor.

type ThreeBitBase

type ThreeBitBase uint64

This could have been uint8, but with that I found lots of casting between uint64 and uint8. I don't think many ThreeBitBases will be sitting around for a long time by themselves so I don't think the extra memory will be noticed. Even though it is encoded as 64 bits, only the last three can be used (zero to seven).

const (
	A          ThreeBitBase = 0
	C          ThreeBitBase = 1
	G          ThreeBitBase = 2
	T          ThreeBitBase = 3
	N          ThreeBitBase = 4
	PaddingOne ThreeBitBase = 5
	PaddingTwo ThreeBitBase = 6
)

The four bases {A,C,G,T} and N are encoded with the numbers 0 to 4. If the length of the DNA sequence stored in the threeBit is not divisible by 21, then there will be "left over" space in Seq[len(Seq)-1]. Currently, if that "left over" space is filled with PaddingOne for one threeBit, and PaddingTwo for another threeBit, the threeBits can be quickly compared for perfect matches. The padding is so that the bits in those bases will not match. TODO: the padding implementation is a bit messy and should probably be changed in the future. This could be done by having the perfect match functions compare the number of matches they will return to the theoretical maximum based on the sequence lengths or padding our a mask could be applied in the comparison function.

func GetThreeBitBase

func GetThreeBitBase(fragment *ThreeBit, pos int) ThreeBitBase

GetThreeBitBase returns a value equivalent to a single base within the ThreeBit at position, pos.

func RuneToThreeBitBase

func RuneToThreeBitBase(r rune) ThreeBitBase

RuneToThreeBitBase returns a single bases in ThreeBitBase format that corresponds to the given rune.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL