bio

module
v0.0.0-...-b8182e9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 4, 2021 License: MIT

README

bio

GoDoc Go Report Card

A lightweight and high-performance (see seqkit benchmark) bioinformatics package.

FASTA/Q parsing

This package has high performance close to the famous C lib kseq.h.

To test the performance, three datasets are used:

  • dataset_A, bacteria genomes, 2.7G
  • dataset_B, human genome, 2.9G
  • dataset_C, Illumina reads, 2.2G

Summary by seqkit:

file           seq_format   seq_type   num_seqs   min_len        avg_len       max_len
dataset_A.fa   FASTA        DNA          67,748        56       41,442.5     5,976,145
dataset_B.fa   FASTA        DNA             194       970   15,978,096.5   248,956,422
dataset_C.fq   FASTQ        DNA       9,186,045       100            100           100

seqtk (Version 1.1-r92-dirty, using kseq.h) and seqkit (Version v0.3.1.1, using this package) were used to test. Note that seqtk does not support wrapped (fixed line width) ouputing, so seqkit uses -w 0 to disable outputing wrapping. Script memusg is used to assess running time and peak memory usage.

Commands

Tests were repeated 5 times and average time and memory usage were computed.

Results:

benchmark.tsv.png

Install

This package is "go-gettable", just:

go get -u github.com/shenwei356/bio

More

See the README of sub package.

Documentation

See documentation on godoc for more detail.

Copyright (c) 2013-2016, Wei Shen (shenwei356@gmail.com)

MIT License

Directories

Path Synopsis
featio
_bed
Package bed is used to read bed features.
Package bed is used to read bed features.
gtf
Package gtf is used to read gtf features.
Package gtf is used to read gtf features.
Package seq balabala This package defines a *Seq* type, and provides some basic operations of sequence, like validation of DNA/RNA/Protein sequence and getting reverse complement sequence.
Package seq balabala This package defines a *Seq* type, and provides some basic operations of sequence, like validation of DNA/RNA/Protein sequence and getting reverse complement sequence.
seqio
fai
Package fai implements fasta sequence file index handling, including creating , reading and random accessing.
Package fai implements fasta sequence file index handling, including creating , reading and random accessing.
fastx
Package fastx seamlessly parses FASTA and FASTQ format file This package seamlessly parses both FASTA and FASTQ formats.
Package fastx seamlessly parses FASTA and FASTQ format file This package seamlessly parses both FASTA and FASTQ formats.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL