Documentation
¶
Index ¶
- func BasesToUint64(seq []dna.Base, start int, end int, padding ThreeBitBase) uint64
- func Cat(a *ThreeBit, b *ThreeBit)
- func GetBase(fragment *ThreeBit, pos int) dna.Base
- func RangeToDnaBases(fragment *ThreeBit, start int, end int) []dna.Base
- func ThreeBitBaseToRune(base ThreeBitBase) rune
- func ThreeBitBaseToString(b ThreeBitBase) string
- func ToDnaBases(fragment *ThreeBit) []dna.Base
- func ToString(fragment *ThreeBit) string
- type ThreeBit
- type ThreeBitBase
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func BasesToUint64 ¶
BasesToUint64 will take a section of seq from start to end (left-closed, right open) that is not more than 21 bases and return the uint64 encoding of it that would be used in a ThreeBit. padding can be either PaddingOne or PaddingTwo. If sequences will be compared to each other, they should have different padding values.
func Cat ¶
Cat appends "b" to "a." "a" is changed to be both of them, and "b" is unchanged. It is quickest to have "a" be the longer sequence.
func GetBase ¶
GetBase returns a value equivalent to a single base within the ThreeBit at position, pos.
func RangeToDnaBases ¶
RangeToDnaBases returns a slice of dna.Base that represents the bases from start to end (left-closed, right-open) of fragment.
func ThreeBitBaseToRune ¶
func ThreeBitBaseToRune(base ThreeBitBase) rune
ThreeBitBaseToRune returns a rune that corresponds to the single base in ThreeBitBase format.
func ThreeBitBaseToString ¶
func ThreeBitBaseToString(b ThreeBitBase) string
ThreeBitBaseToString returns a string that corresponds to the single base give as a ThreeBitBase.
func ToDnaBases ¶
ToDnaBases returns a slice of dna.Base that represents the same sequence of bases present in fragment.
Types ¶
type ThreeBit ¶
ThreeBit is a struct to represent long DNA sequences in a memory efficient format Seq holds the encoded DNA sequence. The left-most three bits of Seq[0] hold the first base. The first base is *not* encoded in the three least significant bits. Because 64 is not divisible by three, the right-most (least significant) bit is not used in the encoding scheme. Len is the length of the DNA sequence, not the length of the Seq slice.
func Append ¶
func Append(fragment *ThreeBit, b ThreeBitBase) *ThreeBit
Append adds "b" to the end of "fragment." "fragment" can be nil.
func FromString ¶
FromString creates a new ThreeBit from a string of DNA characters {A,C,G,T,N}.
func NewThreeBit ¶
func NewThreeBit(inSeq []dna.Base, padding ThreeBitBase) *ThreeBit
NewThreeBit creates a ThreeBit encoding of inSeq with padding on the end.
func NewThreeBitRainbow ¶
func NewThreeBitRainbow(inSeq []dna.Base, padding ThreeBitBase) []*ThreeBit
NewThreeBitRainbow builds a "rainbow table" of a sequence in ThreeBit format with every possible offset, so that there is always a version of the sequence that can be compared to another ThreeBit sequence using xor.
type ThreeBitBase ¶
type ThreeBitBase uint64
This could have been uint8, but with that I found lots of casting between uint64 and uint8. I don't think many ThreeBitBases will be sitting around for a long time by themselves so I don't think the extra memory will be noticed. Even though it is encoded as 64 bits, only the last three can be used (zero to seven).
const ( A ThreeBitBase = 0 C ThreeBitBase = 1 G ThreeBitBase = 2 T ThreeBitBase = 3 N ThreeBitBase = 4 PaddingOne ThreeBitBase = 5 PaddingTwo ThreeBitBase = 6 )
The four bases {A,C,G,T} and N are encoded with the numbers 0 to 4. If the length of the DNA sequence stored in the threeBit is not divisible by 21, then there will be "left over" space in Seq[len(Seq)-1]. Currently, if that "left over" space is filled with PaddingOne for one threeBit, and PaddingTwo for another threeBit, the threeBits can be quickly compared for perfect matches. The padding is so that the bits in those bases will not match. TODO: the padding implementation is a bit messy and should probably be changed in the future. This could be done by having the perfect match functions compare the number of matches they will return to the theoretical maximum based on the sequence lengths or padding our a mask could be applied in the comparison function.
func GetThreeBitBase ¶
func GetThreeBitBase(fragment *ThreeBit, pos int) ThreeBitBase
GetThreeBitBase returns a value equivalent to a single base within the ThreeBit at position, pos.
func RuneToThreeBitBase ¶
func RuneToThreeBitBase(r rune) ThreeBitBase
RuneToThreeBitBase returns a single bases in ThreeBitBase format that corresponds to the given rune.