fasta

package
v0.31.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 31, 2024 License: MIT Imports: 10 Imported by: 0

Documentation

Overview

Package fasta contains fasta parsers and writers.

Fasta is a flat text file format developed in 1985 to store nucleotide and amino acid sequences. It is extremely simple and well-supported across many languages. However, this simplicity means that annotation of genetic objects is not supported.

This package provides a parser and writer for working with Fasta formatted genetic sequences.

Example (Basic)

This example shows how to open a file with the fasta parser. The sequences within that file can then be analyzed further with different software.

package main

import (
	"fmt"

	_ "embed"
	"github.com/bebop/poly/io/fasta"
)

func main() {
	fastas, _ := fasta.Read("data/base.fasta")
	fmt.Println(fastas[1].Sequence)
}
Output:

ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK*

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func Build

func Build(fastas []Fasta) ([]byte, error)

Build converts a Fastas array into a byte array to be written to a file.

Example

ExampleBuild shows basic usage for Build

package main

import (
	"bytes"
	"fmt"

	_ "embed"
	"github.com/bebop/poly/io/fasta"
)

func main() {
	fastas, _ := fasta.Read("data/base.fasta") // get example data
	fasta, _ := fasta.Build(fastas)            // build a fasta byte array
	firstLine := string(bytes.Split(fasta, []byte("\n"))[0])

	fmt.Println(firstLine)
}
Output:

>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]

func ParseConcurrent

func ParseConcurrent(r io.Reader, sequences chan<- Fasta)

ParseConcurrent concurrently parses a given Fasta file in an io.Reader into a channel of Fasta structs.

func ReadConcurrent

func ReadConcurrent(path string, sequences chan<- Fasta)

ReadConcurrent concurrently reads a flat Fasta file into a Fasta channel.

Example

ExampleReadConcurrent shows how to use the concurrent parser for decompressed fasta files.

package main

import (
	"fmt"

	_ "embed"
	"github.com/bebop/poly/io/fasta"
)

func main() {
	fastas := make(chan fasta.Fasta, 100)
	go fasta.ReadConcurrent("data/base.fasta", fastas)
	var name string
	for fasta := range fastas {
		name = fasta.Name
	}

	fmt.Println(name)
}
Output:

MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken

func ReadGzConcurrent

func ReadGzConcurrent(path string, sequences chan<- Fasta)

ReadGzConcurrent concurrently reads a gzipped Fasta file into a Fasta channel. Deprecated: Use Parser.ParseNext() instead.

Example

ExampleReadGzConcurrent shows how to use the concurrent parser for larger files.

package main

import (
	"fmt"

	_ "embed"
	"github.com/bebop/poly/io/fasta"
)

func main() {
	fastas := make(chan fasta.Fasta, 1000)
	go fasta.ReadGzConcurrent("data/uniprot_1mb_test.fasta.gz", fastas)
	var name string
	for fasta := range fastas {
		name = fasta.Name
	}

	fmt.Println(name)
}
Output:

sp|P86857|AGP_MYTCA Alanine and glycine-rich protein (Fragment) OS=Mytilus californianus OX=6549 PE=1 SV=1

func Write

func Write(fastas []Fasta, path string) error

Write writes a fasta array to a file.

Example

ExampleWrite shows basic usage of the writer.

package main

import (
	"fmt"
	"os"

	_ "embed"
	"github.com/bebop/poly/io/fasta"
)

func main() {
	fastas, _ := fasta.Read("data/base.fasta")       // get example data
	_ = fasta.Write(fastas, "data/test.fasta")       // write it out again
	testSequence, _ := fasta.Read("data/test.fasta") // read it in again

	os.Remove("data/test.fasta") // getting rid of test file

	fmt.Println(testSequence[0].Name)
}
Output:

gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]

Types

type Fasta

type Fasta struct {
	Name     string `json:"name"`
	Sequence string `json:"sequence"`
}

Fasta is a struct representing a single Fasta file element with a Name and its corresponding Sequence.

func Parse

func Parse(r io.Reader) ([]Fasta, error)

Parse parses a given Fasta file into an array of Fasta structs. Internally, it uses ParseFastaConcurrent.

Example

ExampleParse shows basic usage for Parse.

package main

import (
	"fmt"
	"os"

	_ "embed"
	"github.com/bebop/poly/io/fasta"
)

func main() {
	file, _ := os.Open("data/base.fasta")
	fastas, _ := fasta.Parse(file)

	fmt.Println(fastas[0].Name)
}
Output:

gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]

func Read

func Read(path string) ([]Fasta, error)

Read reads a file into an array of Fasta structs

Example

ExampleRead shows basic usage for Read.

package main

import (
	"fmt"

	_ "embed"
	"github.com/bebop/poly/io/fasta"
)

func main() {
	fastas, _ := fasta.Read("data/base.fasta")
	fmt.Println(fastas[0].Name)
}
Output:

gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]

func ReadGz

func ReadGz(path string) ([]Fasta, error)

ReadGz reads a gzipped file into an array of Fasta structs.

Example

ExampleReadGz shows basic usage for ReadGz on a gzip'd file.

package main

import (
	"fmt"

	_ "embed"
	"github.com/bebop/poly/io/fasta"
)

func main() {
	fastas, _ := fasta.ReadGz("data/uniprot_1mb_test.fasta.gz")
	var name string
	for _, fasta := range fastas {
		name = fasta.Name
	}

	fmt.Println(name)
}
Output:

sp|P86857|AGP_MYTCA Alanine and glycine-rich protein (Fragment) OS=Mytilus californianus OX=6549 PE=1 SV=1

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

Parser is a flexible parser that provides ample control over reading fasta-formatted sequences. It is initialized with NewParser.

Example
package main

import (
	"fmt"
	"strings"

	_ "embed"
	"github.com/bebop/poly/io/fasta"
)

//go:embed data/base.fasta
var baseFasta string

func main() {
	parser := fasta.NewParser(strings.NewReader(baseFasta), 256)
	for {
		fasta, _, err := parser.ParseNext()
		if err != nil {
			fmt.Println(err)
			break
		}
		fmt.Println(fasta.Name)
	}
}
Output:

gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
EOF

func NewParser

func NewParser(r io.Reader, maxLineSize int) *Parser

NewParser returns a Parser that uses r as the source from which to parse fasta formatted sequences.

func (*Parser) ParseAll

func (parser *Parser) ParseAll() ([]Fasta, error)

ParseAll parses all sequences in underlying reader only returning non-EOF errors. It returns all valid fasta sequences up to error if encountered.

func (*Parser) ParseByteLimited

func (parser *Parser) ParseByteLimited(byteLimit int64) (fastas []Fasta, bytesRead int64, err error)

ParseByteLimited parses fastas until byte limit is reached. This is NOT a hard limit. To set a hard limit on bytes read use a io.LimitReader to wrap the reader passed to the Parser.

func (*Parser) ParseN

func (parser *Parser) ParseN(maxSequences int) (fastas []Fasta, err error)

ParseN parses up to maxSequences fasta sequences from the Parser's underlying reader. ParseN does not return EOF if encountered. If an non-EOF error is encountered it returns it and all correctly parsed sequences up to then.

func (*Parser) ParseNext

func (parser *Parser) ParseNext() (Fasta, int64, error)

ParseNext reads next fasta genome in underlying reader and returns the result and the amount of bytes read during the call. ParseNext only returns an error if it:

  • Attempts to read and fails to find a valid fasta sequence.
  • Returns reader's EOF if called after reader has been exhausted.
  • If a EOF is encountered immediately after a sequence with no newline ending. In this case the Fasta up to that point is returned with an EOF error.

It is worth noting the amount of bytes read are always right up to before the next fasta starts which means this function can effectively be used to index where fastas start in a file or string.

func (*Parser) Reset

func (parser *Parser) Reset(r io.Reader)

Reset discards all data in buffer and resets state.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL