vcf

package
v5.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 27, 2022 License: AGPL-3.0, AGPL-3.0-or-later Imports: 15 Imported by: 0

Documentation

Overview

Package vcf is a library for for parsing, representing, and writing VCF files. See http://samtools.github.io/hts-specs/VCFv4.3.pdf

Index

Constants

View Source
const (
	VcfExt = ".vcf"
	GzExt  = ".gz"
)

The possible file extensions for VCF or gz-compressed VCF files

View Source
const (
	FileFormatVersion     = "VCFv4.3"
	FileFormatVersionLine = "##fileformat=VCFv4.3"
)

The supported VCF file format version.

View Source
const (
	NumberA int32 = -1 * (1 + iota)
	NumberR
	NumberG
	NumberDot
	InvalidNumber
)

Constants for format information Number entries.

Variables

View Source
var (
	END  = utils.Intern("END")
	GT   = utils.Intern("GT")
	PASS = utils.Intern("PASS")
)

Commonly used VCF entries.

View Source
var DefaultHeaderColumns = []string{"CHROM", "POS", "ID", "REF", "ALT", "QUAL", "FILTER", "INFO"}

DefaultHeaderColumns for VCF files.

Functions

func FormatFormatInformation

func FormatFormatInformation(out *bufio.Writer, format *FormatInformation, infoNotFormat bool)

FormatFormatInformation outputs VCF info or format information

func FormatMetaInformation

func FormatMetaInformation(out *bufio.Writer, meta interface{})

FormatMetaInformation outputs VCF meta information, which can be just a string or *MetaInformation

func FormatString

func FormatString(out io.ByteWriter, str string)

FormatString outputs a string to a VCF file, adding necessary double quotes and escapes

func FormatVariants

func FormatVariants(out *bufio.Writer, variants []Variant)

func SkipHeader

func SkipHeader(reader *bufio.Reader) (lines int)

SkipHeader skips a VCF header. This is more efficient than calling ParseHeader and ignoring its result.

Types

type FieldParser

type FieldParser func(*StringScanner) interface{}

FieldParser is an abstraction for parsing VCF fields

func CreateFormatParser

func CreateFormatParser(format *FormatInformation) FieldParser

CreateFormatParser creates a specific VCF format section parser for the given format information

func CreateInfoParser

func CreateInfoParser(format *FormatInformation) FieldParser

CreateInfoParser creates a specific VCF info section parser for the given format information

type FormatInformation

type FormatInformation struct {
	ID          utils.Symbol
	Description string // "" if not present
	Number      int32  // > InvalidNumber
	Type        Type
	Fields      utils.StringMap
}

FormatInformation in VCF files.

func NewFormatInformation

func NewFormatInformation() *FormatInformation

NewFormatInformation creates an empty instance.

type Genotype

type Genotype struct {
	Phased bool
	GT     []int32        // < 0 for unknown entries
	Data   utils.SmallMap // values are nil (for missing entry), int, float64, rune, string, or []interface{}
}

Genotype is a structured representation of the GT entry in a VCF file.

type Header struct {
	FileFormat string
	Infos      []*FormatInformation
	Formats    []*FormatInformation
	Meta       map[string][]interface{} // string or *MetaInformation
	Columns    []string
}

Header section of a VCF files.

func NewHeader

func NewHeader() *Header

NewHeader creates an empty instance.

func ParseHeader

func ParseHeader(reader *bufio.Reader) (hdr *Header, lines int)

ParseHeader parses a VCF header

func (*Header) Format

func (header *Header) Format(out *bufio.Writer)

Format outputs a VCF header

func (*Header) NewVariantParser

func (header *Header) NewVariantParser() *VariantParser

NewVariantParser creates a VariantParser for the given VCF header.

func (*Header) ParseVariants

func (header *Header) ParseVariants(input *InputFile) []Variant

ParseVariants parses VCF variant lines based on the given VCF header.

type InputFile

type InputFile struct {
	*bufio.Reader
	// contains filtered or unexported fields
}

InputFile represents a VCF or BCF file for input.

func Open

func Open(name string) *InputFile

Open a VCF file for input.

Whether the format is gzipped or not is determined from the content of the input, not from any file extensions.

If the name is "/dev/stdin", then the input is read from os.Stdin

func OpenIfExists

func OpenIfExists(name string) (*InputFile, bool)

Open a VCF file for input, returning false if it doesn't exist.

Whether the format is gzipped or not is determined from the content of the input, not from any file extensions.

If the name is "/dev/stdin", then the input is read from os.Stdin

func (*InputFile) Close

func (input *InputFile) Close()

Close the VCF input file.

func (*InputFile) Parse

func (input *InputFile) Parse() *Vcf

Parse parseses a full VCF files.

type MetaInformation

type MetaInformation struct {
	ID          utils.Symbol
	Description string // "" if not present
	Fields      utils.StringMap
}

MetaInformation in VCF files.

func NewMetaInformation

func NewMetaInformation() *MetaInformation

NewMetaInformation creates an empty instance.

type OutputFile

type OutputFile struct {
	*bufio.Writer
	// contains filtered or unexported fields
}

OutputFile represents a VCF or BCF file for output.

func Create

func Create(name string, format string, level int) *OutputFile

Create a VCF file for output.

The format string can be "vcf" or "gz". If the format string is empty, the output format is determined by looking at the filename extension. If the filename extension is not .gz, then .vcf is always assumed.

The format string will not become part of the resulting filename.

Following zlib, levels range from 1 (BestSpeed) to 9 (BestCompression); higher levels typically run slower but compress more. Level 0 (NoCompression) does not attempt any compression; it only adds the necessary DEFLATE framing. Level -1 (DefaultCompression) uses the default compression level. Level -2 (HuffmanOnly) will use Huffman compression only, giving a very fast compression for all types of input, but sacrificing considerable compression efficiency.

If the name is "/dev/stdout", then the output is written to os.Stdout.

func (*OutputFile) Close

func (output *OutputFile) Close()

Close the VCF input file.

func (*OutputFile) Format

func (output *OutputFile) Format(vcf *Vcf)

Format outputs a full VCF struct.

type StringScanner

type StringScanner struct {
	// contains filtered or unexported fields
}

A StringScanner can be used scan/parse strings representing lines in VCF files.

The zero StringScanner is valid and empty.

func (*StringScanner) Len

func (sc *StringScanner) Len() int

Len returns the number of ASCII characters that still need to be scanned/parsed.

func (*StringScanner) ParseFormatCharacter

func (sc *StringScanner) ParseFormatCharacter() interface{}

ParseFormatCharacter parses a rune in a VCF format section

func (*StringScanner) ParseFormatFloat

func (sc *StringScanner) ParseFormatFloat() interface{}

ParseFormatFloat parses a floating point number in a VCF format section

func (*StringScanner) ParseFormatInformation

func (sc *StringScanner) ParseFormatInformation() *FormatInformation

ParseFormatInformation parses VCF format information

func (*StringScanner) ParseFormatInteger

func (sc *StringScanner) ParseFormatInteger() interface{}

ParseFormatInteger parses an integer in a VCF format section

func (*StringScanner) ParseFormatString

func (sc *StringScanner) ParseFormatString() interface{}

ParseFormatString parses a string in a VCF format section

func (*StringScanner) ParseGenericInfo

func (sc *StringScanner) ParseGenericInfo() interface{}

ParseGenericInfo parses a VCF info section without specific format information

func (*StringScanner) ParseInfoCharacter

func (sc *StringScanner) ParseInfoCharacter() interface{}

ParseInfoCharacter parses a rune in a VCF info section

func (*StringScanner) ParseInfoFlag

func (sc *StringScanner) ParseInfoFlag() interface{}

ParseInfoFlag parses a boolean flag in a VCF info section (always returns true)

func (*StringScanner) ParseInfoFloat

func (sc *StringScanner) ParseInfoFloat() interface{}

ParseInfoFloat parses a floating point number in a VCF info section

func (*StringScanner) ParseInfoInteger

func (sc *StringScanner) ParseInfoInteger() interface{}

ParseInfoInteger parses an integer in a VCF info section

func (*StringScanner) ParseInfoString

func (sc *StringScanner) ParseInfoString() interface{}

ParseInfoString parses a string in a VCF info section

func (*StringScanner) ParseMetaField

func (sc *StringScanner) ParseMetaField() (key, value string)

ParseMetaField parses a VCF meta field

func (*StringScanner) ParseMetaInformation

func (sc *StringScanner) ParseMetaInformation() interface{}

ParseMetaInformation parses VCF meta information

func (*StringScanner) ParseVariant

func (sc *StringScanner) ParseVariant(vp *VariantParser) Variant

ParseVariant parses a VCF variant line

func (*StringScanner) Reset

func (sc *StringScanner) Reset(s string)

Reset resets the scanner, and initializes it with the given string.

func (*StringScanner) SkipSpace

func (sc *StringScanner) SkipSpace()

SkipSpace skips ' ' runes

type Type

type Type uint

Type is an enumeration type for different VCF field types

const (
	InvalidType Type = iota
	Integer          // represented as int (not int32, since that's the same as rune in Go)
	Float            // represented as float64 (parsing as float32 seems problematic in some cases in Go)
	Flag             // represented as bool with fixed value true
	Character        // represented as rune
	String           // represented as string
)

The different VCF field types

type Variant

type Variant struct {
	Source         string // this is not part of the VCF spec, but is needed in HaplotypeCaller
	Chrom          string
	Pos            int32    // < 0 if unknown
	ID             []string // nil/empty if missing
	Ref            string
	Alt            []string       // nil/empty if missing
	Qual           interface{}    // float64, or nil if missing
	Filter         []utils.Symbol // nil/empty if missing
	Info           utils.SmallMap // values are int, float64, bool, rune, string, or []interface{}
	GenotypeFormat []utils.Symbol
	GenotypeData   []Genotype
}

Variant line in a VCF file.

func (*Variant) End

func (v *Variant) End() int32

End returns the end position of a VCF line in the reference, determined either by the END field or len(v.Ref)

func (Variant) Format

func (variant Variant) Format(out []byte) []byte

Format outputs a VCF variant line

func (Variant) Pass

func (v Variant) Pass() bool

Pass determines whether the variant passed all filters.

func (*Variant) SetEnd

func (v *Variant) SetEnd(value int32)

SetEnd sets the end position of a VCF line in the reference by setting the END field. If the end position can be calculated from the start position and the length of Ref, delete the END field.

func (Variant) Start

func (v Variant) Start() int32

Start returns the start position of a VCF line in the reference.

type VariantParser

type VariantParser struct {
	InfoParsers, FormatParsers utils.SmallMap
	NSamples                   int
}

VariantParser is an optimized parser for VCF variant lines.

NSamples can be decreased as necessary to parse fewer samples, including down to zero.

type Vcf

type Vcf struct {
	Header   *Header
	Variants []Variant
}

Vcf represents the full contents of a VCF file.

func (*Vcf) Format

func (vcf *Vcf) Format(out *bufio.Writer)

Format outputs a full VCF struct

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL