Documentation
¶
Index ¶
- Variables
- func Hash64(key uint64) uint64
- func IsLowComplexity(code uint64, k int) bool
- func MustDecode(code uint64, k uint8) []byte
- func MustDecoder() func(code uint64, k uint8) []byte
- type LexicHash
- func New(k int, nMasks int, p int) (*LexicHash, error)
- func NewFromFile(file string) (*LexicHash, error)
- func NewFromTextFile(file string) (*LexicHash, error)
- func NewWithMasks(k int, masks []uint64) (*LexicHash, error)
- func NewWithSeed(k int, nMasks int, randSeed int64, p int) (*LexicHash, error)
- func Read(r io.Reader) (*LexicHash, error)
- func (lh *LexicHash) Mask(s []byte, skipRegions [][2]int) (*[]uint64, *[][]int, error)
- func (lh *LexicHash) MaskLongSeqs(s []byte, skipRegions [][2]int) (*[]uint64, *[][]int, error)
- func (lh *LexicHash) RecycleMaskResult(kmers *[]uint64, locses *[][]int)
- func (lh *LexicHash) Write(w io.Writer) (int, error)
- func (lh *LexicHash) WriteToFile(file string) (int, error)
Constants ¶
This section is empty.
Variables ¶
var ErrBrokenFile = errors.New("lexichash: broken file")
ErrBrokenFile means the file is not complete.
var ErrInsufficientMasks = errors.New("lexichash: insufficient masks (should be >=64)")
ErrInsufficientMasks means the number of masks is too small.
var ErrInvalidFileFormat = errors.New("lexichash: invalid binary format")
ErrInvalidFileFormat means invalid file format.
var ErrKOverflow = errors.New("lexichash: k-mer size overflow, valid range is [5-32]")
ErrKOverflow means K > 32.
var ErrVersionMismatch = errors.New("lexichash: version mismatch")
ErrVersionMismatch means version mismatch between files and program.
var Magic = [8]byte{'k', 'm', 'e', 'r', 'm', 'a', 's', 'k'}
var MainVersion uint8 = 0
var MinorVersion uint8 = 1
var Strands = [2]byte{'+', '-'}
Strands could be used to output strand for a reverse complement flag
Functions ¶
func Hash64 ¶ added in v0.2.0
https://gist.github.com/badboy/6267743 . version with mask: https://gist.github.com/lh3/974ced188be2f90422cc .
func IsLowComplexity ¶
IsLowComplexity checks if a k-mer is of low-complexity.
func MustDecoder ¶
MustDecoder returns a Decode function, which reuses the byte slice
Types ¶
type LexicHash ¶
type LexicHash struct {
K int // max length of shared substrings, should be in range of [4, 31]
Seed int64 // seed for generating masks
Masks []uint64 // masks/k-mers
// contains filtered or unexported fields
}
LexicHash is for finding shared substrings between nucleotide sequences.
func New ¶
New returns a new LexicHash object. nMasks should be >=64, and better be >= 1024 and better be power of 4, i.e., 64, 256, 1024, 4096 ... p is the length of mask k-mer prefixes which need to be checked for low-complexity. p == 0 for no checking.
func NewFromFile ¶
NewFromFile creates a LexicHash from a binary file.
func NewFromTextFile ¶ added in v0.3.0
NewFromTextFile creates a new LexicHash object with custom kmers in a txt file.
func NewWithMasks ¶ added in v0.3.0
NewWithMasks creates a new LexicHash object with custom kmers. nMasks should be >=64, and better be >= 1024 and better be power of 4, i.e., 64, 256, 1024, 4096 ...
func NewWithSeed ¶
NewWithSeed creates a new LexicHash object with given seed. nMasks should be >=64, and better be >= 1024 and better be power of 4, i.e., 64, 256, 1024, 4096 ... p is the length of mask k-mer prefixes which need to be checked for low-complexity. p == 0 for no checking.
func (*LexicHash) Mask ¶
Mask computes the most similar substrings for each mask in sequence s. It returns
- the list of the most similar k-mers for each mask.
- the start 0-based positions of all k-mers, with the last 1 bit as the strand flag (1 for negative strand).
skipRegions is optional, which is used to skip some masked regions. E.g., in reference indexing step, contigs of a genome can be concatenated with k-1 N's, where need to be ommitted.
The regions should be 0-based and ascendingly sorted. e.g., [100, 130], [200, 230] ...
func (*LexicHash) MaskLongSeqs ¶ added in v0.2.0
MaskLongSeqs is faster than Mask() for longer sequences, requiring nMasks >= 1024.
func (*LexicHash) RecycleMaskResult ¶
RecycleMaskResult recycles the results of Mask(). Please do not forget to call this method after using the mask results.