fasta

package
v0.17.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 7, 2022 License: MIT Imports: 7 Imported by: 0

Documentation

Overview

Package fasta contains fasta parsers and writers.

Fasta is a flat text file format developed in 1985 to store nucleotide and amino acid sequences. It is extremely simple and well supported across many languages. However, this simplicity means that annotation of genetic objects is not supported.

This package provides a parser and writer for working with Fasta formatted genetic sequences.

Example (Basic)

This example shows how to open a file with the fasta parser. The sequences within that file can then be analyzed further with different software.

package main

import (
	"fmt"
	"github.com/TimothyStiles/poly/io/fasta"
)

func main() {
	fastas := fasta.Read("data/base.fasta")
	fmt.Println(fastas[1].Sequence)
}
Output:

ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK*

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func Build

func Build(fastas []Fasta) []byte

Build writes a Fasta struct to a string.

Example

ExampleBuild shows basic usage for Build

fastas := Read("data/base.fasta") // get example data
fasta := Build(fastas)            // build a fasta byte array
firstLine := string(bytes.Split(fasta, []byte("\n"))[0])

fmt.Println(firstLine)
Output:

>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]

func ParseConcurrent

func ParseConcurrent(r io.Reader, sequences chan<- Fasta)

ParseConcurrent concurrently parses a given Fasta file in an io.Reader into a channel of Fasta structs.

func ReadConcurrent

func ReadConcurrent(path string, sequences chan<- Fasta)

ReadConcurrent concurrently reads a flat Fasta file into a Fasta channel.

Example

ExampleReadConcurrent shows how to use the concurrent parser for decompressed fasta files.

fastas := make(chan Fasta, 100)
go ReadConcurrent("data/base.fasta", fastas)
var name string
for fasta := range fastas {
	name = fasta.Name
}

fmt.Println(name)
Output:

MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken

func ReadGzConcurrent

func ReadGzConcurrent(path string, sequences chan<- Fasta)

ReadGzConcurrent concurrently reads a gzipped Fasta file into a Fasta channel.

Example

ExampleReadGzConcurrent shows how to use the concurrent parser for larger files.

fastas := make(chan Fasta, 1000)
go ReadGzConcurrent("data/uniprot_1mb_test.fasta.gz", fastas)
var name string
for fasta := range fastas {
	name = fasta.Name
}

fmt.Println(name)
Output:

sp|P86857|AGP_MYTCA Alanine and glycine-rich protein (Fragment) OS=Mytilus californianus OX=6549 PE=1 SV=1

func Write

func Write(fastas []Fasta, path string)

Write writes a string to a file.

Example

ExampleWrite shows basic usage of the writer.

fastas := Read("data/base.fasta")       // get example data
Write(fastas, "data/test.fasta")        // write it out again
testSequence := Read("data/test.fasta") // read it in again

os.Remove("data/test.fasta") // getting rid of test file

fmt.Println(testSequence[0].Name)
Output:

gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]

Types

type Fasta

type Fasta struct {
	Name     string `json:"name"`
	Sequence string `json:"sequence"`
}

Fasta is a struct representing a single Fasta file element with a Name and its corresponding Sequence.

func Parse

func Parse(r io.Reader) []Fasta

Parse parses a given Fasta file into an array of Fasta structs. Internally, it uses ParseFastaConcurrent.

Example

ExampleParse shows basic usage for Parse.

file, _ := os.Open("data/base.fasta")
fastas := Parse(file)

fmt.Println(fastas[0].Name)
Output:

gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]

func Read

func Read(path string) []Fasta

Read reads a file into an array of Fasta structs

Example

ExampleRead shows basic usage for Read.

fastas := Read("data/base.fasta")
fmt.Println(fastas[0].Name)
Output:

gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]

func ReadGz

func ReadGz(path string) []Fasta

ReadGz reads a gzipped file into an array of Fasta structs.

Example

ExampleReadGz shows basic usage for ReadGz on a gzip'd file.

fastas := ReadGz("data/uniprot_1mb_test.fasta.gz")
var name string
for _, fasta := range fastas {
	name = fasta.Name
}

fmt.Println(name)
Output:

sp|P86857|AGP_MYTCA Alanine and glycine-rich protein (Fragment) OS=Mytilus californianus OX=6549 PE=1 SV=1

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL