gff

package
v0.0.0-...-6fbfe6d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 4, 2015 License: BSD-3-Clause Imports: 17 Imported by: 0

Documentation

Overview

Package gff provides types to read and write version 2 General Feature Format files according to the Sanger Institute specification.

The specification can be found at http://www.sanger.ac.uk/resources/software/gff/spec.html.

Index

Constants

View Source
const Astronomical = "2006-1-02"

"Astronomical" time format is the format specified in the GFF specification

View Source
const Version = 2

Version is the GFF version that is read and written.

Variables

View Source
var (
	ErrBadFeature     = Error{"gff: feature start not less than feature end"}
	ErrBadStrandField = Error{"gff: bad strand field"}
	ErrBadStrand      = Error{"gff: invalid strand"}
	ErrClosed         = Error{"gff: writer closed"}
	ErrBadTag         = Error{"gff: invalid tag"}
	ErrCannotHeader   = Error{"gff: cannot write header: data written"}
	ErrNotHandled     = Error{"gff: type not handled"}
	ErrFieldMissing   = Error{"gff: missing fields"}
	ErrBadMoltype     = Error{"gff: invalid moltype"}
	ErrEmptyMetaLine  = Error{"gff: empty comment metaline"}
	ErrBadMetaLine    = Error{"gff: incomplete metaline"}
	ErrBadSequence    = Error{"gff: corrupt metasequence"}
)

Functions

This section is empty.

Types

type Attribute

type Attribute struct {
	Tag, Value string
}

An Attribute represents a GFF2 attribute field record. Attribute field records must have an tag value structure following the syntax used within objects in a .ace file, flattened onto one line by semicolon separators. Tags must be standard identifiers ([A-Za-z][A-Za-z0-9_]*). Free text values must be quoted with double quotes.

Note: all non-printing characters in free text value strings (e.g. newlines, tabs, control characters, etc) must be explicitly represented by their C (UNIX) style backslash-escaped representation.

type Attributes

type Attributes []Attribute

func (Attributes) Format

func (a Attributes) Format(fs fmt.State, c rune)

func (Attributes) Get

func (a Attributes) Get(tag string) string

type Error

type Error struct {
	// contains filtered or unexported fields
}

func (Error) Error

func (e Error) Error() string

type Feature

type Feature struct {
	// The name of the sequence. Having an explicit sequence name allows
	// a feature file to be prepared for a data set of multiple sequences.
	// Normally the seqname will be the identifier of the sequence in an
	// accompanying fasta format file. An alternative is that SeqName is
	// the identifier for a sequence in a public database, such as an
	// EMBL/Genbank/DDBJ accession number. Which is the case, and which
	// file or database to use, should be explained in accompanying
	// information.
	SeqName string

	// The source of this feature. This field will normally be used to
	// indicate the program making the prediction, or if it comes from
	// public database annotation, or is experimentally verified, etc.
	Source string

	// The feature type name.
	Feature string

	// FeatStart must be less than FeatEnd and non-negative - GFF indexing
	// is one-base and GFF features cannot have a zero length or a negative
	// position. gff.Feature indexing is, to be consistent with the rest of
	// the library zero-based half open. Translation between zero- and one-
	// based indexing is handled by the gff package.
	FeatStart, FeatEnd int

	// A floating point value representing the score for the feature. A nil
	// value indicates the score is not available.
	FeatScore *float64

	// The strand of the feature - one of seq.Plus, seq.Minus or seq.None.
	// seq.None should be used when strand is not relevant, e.g. for
	// dinucleotide repeats. This field should be set to seq.None for RNA
	// and protein features.
	FeatStrand seq.Strand

	// FeatFrame indicates the frame of the feature. and takes the values
	// Frame0, Frame1, Frame2 or NoFrame. Frame0 indicates that the
	// specified region is in frame. Frame1 indicates that there is one
	// extra base, and Frame2 means that the third base of the region
	// is the first base of a codon. If the FeatStrand is seq.Minus, then
	// the first base of the region is value of FeatEnd, because the
	// corresponding coding region will run from FeatEnd to FeatStart on
	// the reverse strand. As with FeatStrand, if the frame is not relevant
	// then set FeatFrame to NoFram. This field should be set to seq.None
	// for RNA and protein features.
	FeatFrame Frame

	// FeatAttributes represents a collection of GFF2 attributes.
	FeatAttributes Attributes

	// Free comments.
	Comments string
}

A Feature represents a standard GFF2 feature.

func (*Feature) Description

func (g *Feature) Description() string

func (*Feature) End

func (g *Feature) End() int

func (*Feature) Len

func (g *Feature) Len() int

func (*Feature) Location

func (g *Feature) Location() feat.Feature

func (*Feature) Name

func (g *Feature) Name() string

func (*Feature) Start

func (g *Feature) Start() int

type Frame

type Frame int8

Frame holds feature frame information.

const (
	NoFrame Frame = iota - 1
	Frame0
	Frame1
	Frame2
)

func (Frame) String

func (f Frame) String() string

type Metadata

type Metadata struct {
	Name          string
	Date          time.Time
	Version       int
	SourceVersion string
	Type          feat.Moltype
}

type Reader

type Reader struct {
	TimeFormat string // Required for parsing date fields. Defaults to astronomical format.

	Metadata
	// contains filtered or unexported fields
}

A Reader can parse GFFv2 formatted io.Reader and return feat.Features.

func NewReader

func NewReader(r io.Reader) *Reader

NewReader returns a new GFFv2 format reader that reads from r.

func (*Reader) Read

func (r *Reader) Read() (f feat.Feature, err error)

Read reads a single feature or part and return it or an error. A call to read may have side effects on the Reader's Metadata field.

type Region

type Region struct {
	Sequence
	RegionStart int
	RegionEnd   int
}

A Region is a feat.Feature

func (*Region) Description

func (r *Region) Description() string

func (*Region) End

func (r *Region) End() int

func (*Region) Len

func (r *Region) Len() int

func (*Region) Location

func (r *Region) Location() feat.Feature

func (*Region) Start

func (r *Region) Start() int

type Sequence

type Sequence struct {
	SeqName string
	Type    feat.Moltype
}

A Sequence is a feat.Feature

func (Sequence) Description

func (s Sequence) Description() string

func (Sequence) End

func (s Sequence) End() int

func (Sequence) Len

func (s Sequence) Len() int

func (Sequence) Location

func (s Sequence) Location() feat.Feature

func (Sequence) MolType

func (s Sequence) MolType() feat.Moltype

func (Sequence) Name

func (s Sequence) Name() string

func (Sequence) Start

func (s Sequence) Start() int

type Writer

type Writer struct {
	TimeFormat string
	Precision  int
	Width      int
	// contains filtered or unexported fields
}

A Writer outputs features and sequences into GFFv2 format.

func NewWriter

func NewWriter(w io.Writer, width int, header bool) *Writer

Returns a new GFF format writer using w. When header is true, a version header will be written to the GFF.

func (*Writer) Write

func (w *Writer) Write(f feat.Feature) (n int, err error)

Write writes a single feature and return the number of bytes written and any error. gff.Features are written as a canonical GFF line, seq.Sequences are written as inline sequence in GFF format (note that only sequences of feat.Moltype DNA, RNA and Protein are supported). gff.Sequences are not handled as they have a zero length. All other feat.Feature are written as sequence region metadata lines.

func (*Writer) WriteComment

func (w *Writer) WriteComment(c string) (n int, err error)

WriteComment writes a comment line to a GFF file.

func (*Writer) WriteMetaData

func (w *Writer) WriteMetaData(d interface{}) (n int, err error)

WriteMetaData writes a meta data line to a GFF file. The type of metadata line depends on the type of d: strings and byte slices are written verbatim, an int is interpreted as a version number and can only be written before any other data, feat.Moltype and gff.Sequence types are written as sequence type lines, gff.Features and gff.Regions are written as sequence regions, sequences are written _n GFF format and time.Time values are written as date line. All other type return an ErrNotHandled.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL