sam

package

v1.0.1 Latest Latest Go to latest Published: May 1, 2024 License: BSD-3-Clause Imports: 21 Imported by: 14

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/vertgenlab/gonomics

Links

Open Source Insights

Documentation ¶

Overview ¶

Package sam provides functions for reading, writing, and manipulating sam files. The Sam struct is built off the v1.6 specification defined at https://github.com/samtools/hts-specs

Index ¶

Constants
Variables
func DecodeBam(r *BamReader, s *Sam) (binId uint32, err error)
func DiploidBaseString(base DiploidBase) string
func DiploidBaseToBases(base DiploidBase) []dna.Base
func DiploidInsertionToSeqs(i DiploidInsertion) [][]dna.Base
func Equal(a Sam, b Sam) bool
func GoPileup(reads <-chan Sam, header Header, includeNoData bool, ...) <-chan Pile
func GoSyncPileups(samples ...<-chan Pile) <-chan []Pile
func IsDuplicate(sam Sam) bool
func IsForwardRead(sam Sam) bool
func IsNotPrimaryAlign(sam Sam) bool
func IsPaired(sam Sam) bool
func IsPosStrand(sam Sam) bool
func IsReverseRead(sam Sam) bool
func IsSupplementaryAlign(sam Sam) bool
func IsUnmapped(sam Sam) bool
func MakeDiploidBaseEmpiricalPriorCache(inFile string) ([][]float64, float64, float64)
func MakeDiploidBaseFlatPriorCache() [][]float64
func MakeDiploidBasePriorCache(delta float64, gamma float64) [][]float64
func MakeDiploidIndelPriorCache(kappa float64, delta float64) []float64
func MakeHaploidBasePriorCache(delta float64, gamma float64) [][]float64
func MakeHaploidIndelPriorCache(delta float64, kappa float64) []float64
func MateIsPosStrand(sam Sam) bool
func MateIsUnmapped(sam Sam) bool
func OpenBam(filename string) (*BamReader, Header)
func ParseExtra(s *Sam) error
func ProperlyAligned(sam Sam) bool
func QueryTag(s Sam, t string) (value interface{}, found bool, error error)
func Read(filename string) ([]Sam, Header)
func ReadFailsQc(sam Sam) bool
func ToString(aln Sam) string
func Write(filename string, data []Sam, header Header)
func WriteFromChan(data <-chan Sam, filename string, header Header, wg *sync.WaitGroup)
func WriteHeaderToFileHandle(file io.Writer, header Header)
func WriteSamPeToFileHandle(file io.Writer, pe SamPE)
func WriteToBamFileHandle(bw *BamWriter, s Sam, bin uint16)
func WriteToFileHandle(file io.Writer, aln Sam)
type AncientLikelihoodCache
type Bai
- func ReadBai(filename string) Bai
type BamReader
- func (r *BamReader) Close() error
type BamWriter
- func NewBamWriter(w io.Writer, h Header) *BamWriter
- func (bw *BamWriter) Close() error
type Consensus
- func PileConsensus(p Pile, substitutionsOnly bool, insertionThreshold float64) Consensus
- func (c Consensus) String() string
type DeletionType
type DiploidBase
- func DiploidBaseCallFromPile(p Pile, refBase dna.Base, priorCache [][]float64, homozygousCache [][]float64, ...) DiploidBase
type DiploidDeletion
- func DiploidDeletionCallFromPile(p Pile, priorCache []float64, homozygousIndelCache [][]float64, ...) DiploidDeletion
type DiploidInsertion
- func DiploidInsertionCallFromPile(p Pile, priorCache []float64, homozygousIndelCache [][]float64, ...) DiploidInsertion
type Grouping
type HaploidCall
- func HaploidCallFromPile(p Pile, refBase dna.Base, epsilon float64, lambda float64, ...) HaploidCall
type Header
- func GenerateHeader(chromSize []chromInfo.ChromInfo, additional []string, sortOrder SortOrder, ...) Header
- func GoReadSamPeToChan(filename string) (<-chan SamPE, Header)
- func GoReadToChan(filename string) (<-chan Sam, Header)
- func GoReadToChanRecycle(filename string, bufferSize int) (parsedRecords <-chan *Sam, recycledStructs chan<- *Sam, header Header)
- func ParseHeaderText(h Header) Header
- func ReadHeader(file *fileio.EasyReader) Header
type HeaderTagMap
type InsertionType
type MatePair
type Metadata
type Pile
- func (p *Pile) String() string
type Sam
- func ReadNext(reader *fileio.EasyReader) (Sam, bool)
- func SeekBamRegion(br *BamReader, bai Bai, chrom string, start, end uint32) []Sam
- func SeekBamRegionRecycle(br *BamReader, bai Bai, chrom string, start, end uint32, recycledSams []Sam) []Sam
- func (s Sam) GetChrom() string
- func (s Sam) GetChromEnd() int
- func (s Sam) GetChromStart() int
- func (s Sam) String() string
- func (s Sam) UpdateLift(c string, start int, end int)
- func (s Sam) WriteToFileHandle(file io.Writer)
type SamPE
type SingleCellAlignment
- func ToSingleCellAlignment(s Sam) SingleCellAlignment
type SortOrder
type Tag

Constants ¶

View Source

const (
	Base      consensusType = 0
	Insertion consensusType = 1
	Deletion  consensusType = 2
	Undefined consensusType = 3
)

Variables ¶

View Source

var ErrNonStdBase error = errors.New("sequence contains bases other than A,C,G,T,N. Other bases are not supported in gonomics")

Functions ¶

func DecodeBam ¶

func DecodeBam(r *BamReader, s *Sam) (binId uint32, err error)

DecodeBam decodes a single Sam from a bgzf Block and stores the decoded sam in to input *Sam. Note that this overwrites all fields in the input sam, therefore if any information must be saved between decode calls, the values should be copied (e.g. copy slices instead of :=)

The first return from DecodeBam is the binId that the decoded Sam was derived from. See sam.Bai for more information about bins and bam indexing. For most use cases, this value can be ignored.

DecodeBam will return ErrNonStdBase if the input record contains bases other than A, C, G, T, or N. The sam return will still be fully parsed and can be used downstream, however all unsupported bases will be set to dna.Nil.

At the end of the file, DecodeBam will return an empty sam and io.EOF.

func DiploidBaseString ¶

func DiploidBaseString(base DiploidBase) string

DiploidBaseString formats a DiploidBase type as a string.

func DiploidBaseToBases ¶

func DiploidBaseToBases(base DiploidBase) []dna.Base

DiploidBaseToBases converts a DiploidBase type into a slice of two dna.Bases.

func DiploidInsertionToSeqs ¶

func DiploidInsertionToSeqs(i DiploidInsertion) [][]dna.Base

DiploidInsertionToSeqs converts the sequences present in a DiploidInsertion struct from strings into a slice of slices of dna.Base, where the first slice of bases is represents the sequence of the first allele and the second slice of bases represents the second allele.

func Equal ¶

func Equal(a Sam, b Sam) bool

Equal returns true if the two input Sam structs are identical.

func GoPileup ¶

func GoPileup(reads <-chan Sam, header Header, includeNoData bool, readFilters []func(s Sam) bool, pileFilters []func(p Pile) bool) <-chan Pile

GoPileup inputs a channel of coordinate sorted Sam structs and generates a Pile for each base. The Pile is sent through the return channel when the Pile position is no longer being updated.

The input readFilter functions must all be true for a read to be included. The input pileFilter functions can be likewise be used to filter the output. All reads/piles are included if filters == nil.

func GoSyncPileups ¶

func GoSyncPileups(samples ...<-chan Pile) <-chan []Pile

GoSyncPileups inputs a slice of channels receiving Pile structs and syncs their outputs to a new output channel that organizes Piles into a slice. The returned slice maintains the order of the input slice. Any of the input samples that have no data for the returned position will have the RefIdx field set to -1.

func IsDuplicate ¶

func IsDuplicate(sam Sam) bool

IsDuplicate returns true if the input record was identified as a duplicate read by the aligner.

func IsForwardRead ¶

func IsForwardRead(sam Sam) bool

IsForwardRead returns true if the input record is the forward read in a mate pair.

func IsNotPrimaryAlign ¶

func IsNotPrimaryAlign(sam Sam) bool

IsNotPrimaryAlign returns true if the input record is a secondary alignment according to the aligner.

func IsPaired ¶

func IsPaired(sam Sam) bool

IsPaired returns true if the input record has a mate pair.

func IsPosStrand ¶

func IsPosStrand(sam Sam) bool

IsPosStrand returns true if the input record aligns to the positive strand.

func IsReverseRead ¶

func IsReverseRead(sam Sam) bool

IsReverseRead returns true if the input record is the reverse read in a mate pair.

func IsSupplementaryAlign ¶

func IsSupplementaryAlign(sam Sam) bool

IsSupplementaryAlign returns true if the input record was identified as a supplementary alignment by the aligner.

func IsUnmapped ¶

func IsUnmapped(sam Sam) bool

IsUnmapped returns true if the input record is unmapped.

func MakeDiploidBaseEmpiricalPriorCache ¶ added in v1.0.1

func MakeDiploidBaseEmpiricalPriorCache(inFile string) ([][]float64, float64, float64)

MakeDiploidBaseEmpiricalPriorCache is a helper function used in samAssembler before running DiploidBaseCallFromPile. The function reads a 4x10 diploid substitution matrix from an input file, as well as empirical values for the parameters epsilon and lambda, which are the second and third return, respectively. Epsilon measures the flat error rate, and lambda is the rate of postmortem cytosine deamination.

func MakeDiploidBaseFlatPriorCache ¶

func MakeDiploidBaseFlatPriorCache() [][]float64

MakeDiploidBaseFlatPriorCache returns an uninformative prior distribution for genotypes. With 10 genotypes, each genotype has an equal prior probability of 0.1. Answers are log transformed.

func MakeDiploidBasePriorCache ¶

func MakeDiploidBasePriorCache(delta float64, gamma float64) [][]float64

MakeDiploidBasePriorCache is a helper function used in samAssembler before running DiploidBaseCallFromPile. Constructs a matrix ([][]float64) of form answer[refBase][outGenotype], where index of refBase: 0=a, 1=c, 2=g, 3=t. Index of outGenotype follows the indices of the DiploidBase type. Answers are log transformed.

func MakeDiploidIndelPriorCache ¶

func MakeDiploidIndelPriorCache(kappa float64, delta float64) []float64

MakeDiploidIndelPriorCache is a helper function used in samAssembler before running DiploidInsertionCallFromPile and DiploidDeletionCallFromPile. Constructs a []float64, where the index corresponds to InsertionType / DeletionType, and the value refers to the log transformed prior probability density. parameterized on delta, the expected divergence rate, and kappa, the proportion of mutations expected to be INDELs.

func MakeHaploidBasePriorCache ¶

func MakeHaploidBasePriorCache(delta float64, gamma float64) [][]float64

MakeHaploidBasePriorCache is a helper function used in samAssembler before running HaploidCallFromPile. Constructs a matrix ([][]float64) of form answer[refBase][outputBase], where the index of both bases 0=a, 1=c, 2=g, 3=t. Values are log transformed.

func MakeHaploidIndelPriorCache ¶

func MakeHaploidIndelPriorCache(delta float64, kappa float64) []float64

MakeHaploidIndelPriorCache is a helper function used in before calling HaploidCallFromPile. Stores two values: answer[0] = p(Base), answer[1] = p(Indel).

func MateIsPosStrand ¶

func MateIsPosStrand(sam Sam) bool

MateIsPosStrand returns true if the input records mate pair aligns to the positive strand.

func MateIsUnmapped ¶

func MateIsUnmapped(sam Sam) bool

MateIsUnmapped returns true if the input records mate pair is unmapped.

func OpenBam ¶

func OpenBam(filename string) (*BamReader, Header)

OpenBam initiates a bgzf reader and parses the header info of the input bam file. OpenBam returns a fully initialized BamReader allocated with a bgzf Block. The second return is a Header struct parsed from plain header text which is stored in the bam file.

func ParseExtra ¶

func ParseExtra(s *Sam) error

ParseExtra generates the text representation of the Extra field. This is required if the file was read from a bam file and the Extra field is going to be modified.

func ProperlyAligned ¶

func ProperlyAligned(sam Sam) bool

ProperlyAligned returns true if the input record is properly aligned according to the aligner.

func QueryTag ¶

func QueryTag(s Sam, t string) (value interface{}, found bool, error error)

QueryTag returns the value of tag t for read s. Currently QueryTag is only supported if the sam was read from a bam file. If no tag data is present, or if the record was not from a bam file, QueryTag will return an error. The input tag t must be exactly 2 characters.

The returned value is an interface that may store any of: uint8, int8, uint16, int16, uint32, int32, float32, string, or a slice of interface{} with any of the previous underlying types. It is the callers responsibility to know/determine the type and perform the assertion.

The second return is false if the requested tag was not found.

func Read ¶

func Read(filename string) ([]Sam, Header)

Read the entire file into a Sam struct where each record is an index in Sam.Sam. Note that sam files can get very large such that storing the entire file in memory is not feasible. Most sam files should be read using the GoReadToChan function which streams sam records so only a small portion of the file is kept in memory at any given time. Reads as a bam if .bam suffix is present.

func ReadFailsQc ¶

func ReadFailsQc(sam Sam) bool

ReadFailsQc returns true if the input record failed quality control.

func ToString ¶

func ToString(aln Sam) string

ToString converts an Sam struct to a tab delimited string per file specs.

func Write ¶

func Write(filename string, data []Sam, header Header)

Write a header and sam alignments to a file. Note that this requires the entire slice of alignments to be present in memory at the same time. This can be avoided by repeated calls to WriteToFileHandle on alignments retrieved from a channel, such as the output from GoReadToChan.

func WriteFromChan ¶

func WriteFromChan(data <-chan Sam, filename string, header Header, wg *sync.WaitGroup)

WriteFromChan writes an incoming channel of Sam structs to a file. The input wait group will be decremented once the write finishes. This is necessary to ensure the main thread will not terminate before the write has finished. Note that the function sending Sam to the data channel and WriteFromChan must not be run on the same thread, or else the process will deadlock.

func WriteHeaderToFileHandle ¶

func WriteHeaderToFileHandle(file io.Writer, header Header)

WriteHeaderToFileHandle writes a sam header to the input file.

func WriteSamPeToFileHandle ¶ added in v1.0.1

func WriteSamPeToFileHandle(file io.Writer, pe SamPE)

WriteSamPeToFileHandle writes a both reads and any supplementary alignments from a SamPE struct to a fileio EasyWriter TODO: Add bam writing functionality

func WriteToBamFileHandle ¶

func WriteToBamFileHandle(bw *BamWriter, s Sam, bin uint16)

WriteToBamFileHandle writes a single Sam struct to a bam file.

func WriteToFileHandle ¶

func WriteToFileHandle(file io.Writer, aln Sam)

WriteToFileHandle writes a single Sam struct to the input file.

Types ¶

type AncientLikelihoodCache ¶

type AncientLikelihoodCache struct {
	EpsilonOverThree                                 []float64 // \frac{\epsilon}{3}
	OneMinusEpsilon                                  []float64 // 1 - \epsilon
	OneMinusEpsilonMinusLambda                       []float64 // 1 - \frac{\epsilon}{3} - \lambda
	EpsilonOverThreePlusLambda                       []float64 // \frac{\epsilon}{3} + \lambda
	PointFiveMinusEpsilonOverThree                   []float64 // 0.5 - \frac{\epsilon}{3}
	EpsilonOverThreePlusLambdaOverTwo                []float64 // \frac{\epsilon}{3} + \frac{\lambda}{2}
	PointFiveMinusEpsilonOverThreePlusLambdaOverTwo  []float64 // 0.5 - \frac{\epsilon}{3} + \frac{\lambda}{2}
	PointFiveMinusEpsilonOverThreeMinusLambdaOverTwo []float64 // 0.5 - \frac{\epsilon}{3} - \frac{\lambda}{2}
}

AncientLikelihoodCache stores the following short expressions from the likelihood functions to save on repeated logspace.Pow calls.

type Bai ¶

type Bai struct {
	// contains filtered or unexported fields
}

Bai is an index of the BAM format that allows for efficient random access of reads given a query genomic range. The Bai struct should not be accessed directly, but used through query functions present within the sam package of gonomics.

func ReadBai ¶

func ReadBai(filename string) Bai

ReadBai reads an entire bai file into a bytes buffer and parses the buffer into a Bai struct which can be used for efficient random access of reads in a bam file.

type BamReader ¶

type BamReader struct {
	// contains filtered or unexported fields
}

BamReader wraps a bgzf.BlockReader with a fully allocated bgzf.Block.

func (*BamReader) Close ¶

func (r *BamReader) Close() error

Close the BamReader and all underlying io.Readers.

type BamWriter ¶

type BamWriter struct {
	*bgzf.BlockWriter
	// contains filtered or unexported fields
}

BamWriter wraps a bgzf.BlockWriter and provides functions to write a Sam struct to a binary Bam file, including WriteToBamFileHandle.

func NewBamWriter ¶

func NewBamWriter(w io.Writer, h Header) *BamWriter

NewBamWriter creates a new bam writer to the input io.Writer. The magic bam bytes and header are immediately written to the BamWriter.

func (*BamWriter) Close ¶

func (bw *BamWriter) Close() error

Close writes any data remaining in the buffer and closes the underlying bgzf.BlockWriter.

type Consensus ¶

type Consensus struct {
	Base      dna.Base
	Insertion []dna.Base
	Deletion  int
	Type      consensusType
}

Consensus contains information on the most observed variant in a Pile struct.

func PileConsensus ¶

func PileConsensus(p Pile, substitutionsOnly bool, insertionThreshold float64) Consensus

PileConsensus returns a Consensus struct from an input Pile struct representing information about the consensus variant observed in that Pile. An insertion is called as the consensus if the count number for the max insertion if maxCount > insertionThreshold * (total counts to bases).

func (Consensus) String ¶

func (c Consensus) String() string

String for debug.

type DeletionType ¶

type DeletionType byte

DeletionType encodes the deletion genotype state, which can be one of the four constant values explained below.

const (
	DaDa    DeletionType = 0
	DaDb    DeletionType = 1
	DaB     DeletionType = 2
	BBNoDel DeletionType = 3
)

type DiploidBase ¶

type DiploidBase byte

const (
	AA DiploidBase = 0
	AC DiploidBase = 1
	AG DiploidBase = 2
	AT DiploidBase = 3
	CC DiploidBase = 4
	CG DiploidBase = 5
	CT DiploidBase = 6
	GG DiploidBase = 7
	GT DiploidBase = 8
	TT DiploidBase = 9
	NN DiploidBase = 10
)

func DiploidBaseCallFromPile ¶

func DiploidBaseCallFromPile(p Pile, refBase dna.Base, priorCache [][]float64, homozygousCache [][]float64, heterozygousCache [][]float64, ancientCache AncientLikelihoodCache, epsilon float64, lambda float64) DiploidBase

DiploidBaseCallFromPile determines which of 10 DiploidBase genotypes are present at the position of an input Pile, preconditioned on the assumption of a diploid base at that position (in other words, not considering INDELs). epsilon represents the error rate, a parameter for the likelihood function. lambda represents the rate of cytosine deamination from postmortem hydrolytic degradation.

type DiploidDeletion ¶

type DiploidDeletion struct {
	Type DeletionType
	Da   int
	Db   int
}

DiploidDeletion is a minimal internal format to encode Deletion variant genotypes.

func DiploidDeletionCallFromPile ¶

func DiploidDeletionCallFromPile(p Pile, priorCache []float64, homozygousIndelCache [][]float64, heterozygousIndelCache [][]float64, epsilon float64) DiploidDeletion

DiploidDeletionCallFromPile returns a DiploidDeletion struct from an input Pile after making a deletion variant call. Takes in a cache for the prior distribution values, and two likelihood caches. Epsilon defines the misclassification rate parameter.

type DiploidInsertion ¶

type DiploidInsertion struct {
	Type InsertionType
	Ia   string
	Ib   string
}

DiploidInsertion is a minimal internal format to encode Insertion variant genotypes.

func DiploidInsertionCallFromPile ¶

func DiploidInsertionCallFromPile(p Pile, priorCache []float64, homozygousIndelCache [][]float64, heterozygousIndelCache [][]float64, epsilon float64) DiploidInsertion

DiploidInsertionCallFromPile returns a DiploidInsertion struct from an input Pile after making an insertion variant call. Takes in a cache for the prior distribution values, and two likelihood caches. Epsilon defines the misclassification rate parameter.

type Grouping ¶

type Grouping string

Grouping defines how the reads are grouped, if they are not sorted.

const (
	None      Grouping = "none"
	Query     Grouping = "query"
	Reference Grouping = "reference"
)

type HaploidCall ¶

type HaploidCall struct {
	Base      dna.Base
	Insertion string
	Deletion  int
}

HaploidCall is a struct to represent the data of a haploid genotype called from a Pileup from a haploid genome.

func HaploidCallFromPile ¶

func HaploidCallFromPile(p Pile, refBase dna.Base, epsilon float64, lambda float64, haploidBasePriorCache [][]float64, haploidIndelPriorCache []float64, homoBaseCache [][]float64, heteroBaseCache [][]float64, homoIndelCache [][]float64, ancientCache AncientLikelihoodCache) HaploidCall

HaploidCallFromPile calls a haploid variant (including Base identity at the Pile position, and INDELs immediately after the position), for an input Pile.

type Header ¶

type Header struct {
	Text     []string
	Metadata Metadata
	Chroms   []chromInfo.ChromInfo // tags SQ - SN and SQ - LN
}

Header encodes the header of a sam file as both the raw header (Text), and semi-parsed fields (Metadata and Chroms).

func GenerateHeader ¶

func GenerateHeader(chromSize []chromInfo.ChromInfo, additional []string, sortOrder SortOrder, grouping Grouping) Header

GenerateHeader creates a header given some information about the input sam. The chromSize can be either from a chromsizes file or by calling fasta.ToChromInfo on a []Fasta. The sortOrder and groupings are defined in metadata.go. Often the most relevant are sam.Unsorted (for sortOrder) and sam.None (for grouping) respectively. Additional tag information should be generated prior to calling this function and passed as a []string which will be appended to the raw text, and parsed into a HeaderTagMap.

func GoReadSamPeToChan ¶ added in v1.0.1

func GoReadSamPeToChan(filename string) (<-chan SamPE, Header)

GoReadSamPeToChan takes a read-name sorted sam file and returns a chanel of SamPE and a Header

func GoReadToChan ¶

func GoReadToChan(filename string) (<-chan Sam, Header)

GoReadToChan streams the input file so that only a small portion of the file is kept in memory at a time. This function automatically handles channel creation, goroutine spawning, and header retrieval.

GoReadToChan will detect if the input file ends in ".bam" and automatically switch to a bam parser.

func GoReadToChanRecycle ¶

func GoReadToChanRecycle(filename string, bufferSize int) (parsedRecords <-chan *Sam, recycledStructs chan<- *Sam, header Header)

GoReadToChanRecycle streams the input file so that only a small portion of the file is kept in memory at a time. This function automatically handles channel creation, goroutine spawning, and header retrieval.

GoReadToChanRecycle will detect if the input file ends in ".bam" and automatically switch to a bam parser.

Unlike GoReadToChan, GoReadToChanRecycle has 2 channel returns, a receiver channel (parsedRecords) and a sender channel (recycledStructs). Parsed sam records can be accessed from the receiver channel. Once the received struct is no longer needed, it can be returned to this function via the sender channel. Compared to GoReadToChan, this system significantly reduces memory allocations as structs can be reused rather then discarded. The total number of Sam structs that are allocated can be set with the bufferSize input variable.

Note that if bufferSize number of structs are retained by the calling function and not returned to this function. No new records will be sent and the goroutines will deadlock.

One notable disadvantage is that each read is only valid until the struct is returned to this function. This means that any information that must be retained between reads must be copied by the calling function.

func ParseHeaderText ¶

func ParseHeaderText(h Header) Header

ParseHeaderText parses the raw text of in the header struct and fills the appropriate fields in the rest of the header struct.

func ReadHeader ¶

func ReadHeader(file *fileio.EasyReader) Header

ReadHeader processes the contiguous header from an EasyReader and advances the Reader past the header lines.

type HeaderTagMap ¶

type HeaderTagMap map[Tag][]map[Tag]string

HeaderTagMap organizes all tags into a map where the line tag (e.g. SQ) stores a slice where each element in the slice corresponds to one line that has the keyed line tag. For instance, the line tag 'SQ' occurs once per chromosome. Using the key 'SQ' in the tag map gives each SQ line as a slice indexed by order of occurrence.

Each element in the resulting slice is itself a map keyed by Tags present in the line. For example, the data stored in the 'SN' tag of the 5th 'SQ' line would be retrieved by TapMap[SQ][4][SN].

For convenience, the HeaderTagMap also stores tags that are further parsed in other fields. e.g. the version number can be retrieved by calling either Metadata.Version or by Metadata.AllTags[HD][0][VN]

Note that comment lines (line tag 'CO') are not stored in HeaderTagMap.

type InsertionType ¶

type InsertionType byte

InsertionType encodes the insertion genotype state, which can be one of the four constant values explained below.

const (
	IaIa    InsertionType = 0 // for insertionA insertionA, a homozygous insertion
	IaIb    InsertionType = 1 // for insertionA insertionB, or complex insertion
	IaB     InsertionType = 2 // for insertion base, or heterozygous insertion
	BBnoIns InsertionType = 3 // for base base, or no insertion
)

type MatePair ¶

type MatePair struct {
	Fwd Sam
	Rev Sam
}

MatePair wraps to paired alignments into a single struct.

type Metadata ¶

type Metadata struct {
	Version   string      // tag HD - VN
	SortOrder []SortOrder // tag HD - SO (for len == 1) or HD - SS (for len > 1)
	Grouping  Grouping    // tag HD - GO
	// TODO: Additional tag parsing (ReadGroup would be good)
	AllTags  HeaderTagMap // map of all tags present in file. Contains duplicates of parsed fields
	Comments []string     // tag CO
}

Metadata stores semi-parsed header data with several explicitly parsed fields (e.g. Version) and the rest of the header encoded in the AllTags field as a HeaderTagMap.

type Pile ¶

type Pile struct {
	RefIdx    int
	Pos       uint32         // 1-base (like Sam)
	CountF    [13]int        // Count[dna.Base] == Number of observed dna.Base, forward reads
	CountR    [13]int        // Count[dna.Base] == Number of observed dna.Base
	InsCountF map[string]int // key is insertion sequence as string, value is number of observations, forward reads. Real base appears to be before insert sequence.
	InsCountR map[string]int // key is insertion sequence as string, value is number of observations. Note that key is forward strand sequence.

	// Note that DelCountF/R DO NOT contribute to the total depth of a particular base.
	// They are only included to preserve multi-base deletion structure for downstream use.
	// Further note that DelCount is only recorded for the 5'-most base in the deletion.
	DelCountF map[int]int // key is the number of contiguous bases that are deleted, value is number of observations, forward reads
	DelCountR map[int]int // key is the number of contiguous bases that are deleted, value is number of observations, reverse reads
	// contains filtered or unexported fields
}

Pile stores the number of each base observed across multiple reads. The number of a particular base can be retrieved by using the dna.Base as the index for Pile.Count. E.g. the number of reads with A is Count[dna.A].

Linked list mechanics: In the pileup function the Pile struct is part of a circular doubly-linked list in which Pile.next/prev is the next/prev contiguous position. Discontiguous positions are not allowed. When adding base information to the pile struct, the pointer to the start of the read position should be kept, and a second pointer will increment along the list as bases are added. The pointer to the beginning of the list should only be incremented when a position is passed and set to zero.

func (*Pile) String ¶

func (p *Pile) String() string

String for debug.

type Sam ¶

type Sam struct {
	QName string        // query template name: [!-?A-~]{1,254}
	Flag  uint16        // bitwise flag, bits defined in info.go
	MapQ  uint8         // mapping quality
	RName string        // reference sequence name: \*|[:rname:∧ *=][:rname:]*
	Pos   uint32        // 1-based leftmost mapping position
	Cigar []cigar.Cigar // parsed cigar: originally string with \*|([0-9]+[MIDNSHPX=])+
	RNext string        // reference name of the mate/next read: \*|=|[:rname:∧ *=][:rname:]*
	PNext uint32        // position of the mate/next read
	TLen  int32         // observed template length
	Seq   []dna.Base    // parsed sequence: originally string with \*|[A-Za-z=.]+
	Qual  string        // ASCII of Phred-scaled base quality+33: [!-~]+ // TODO: parse to []Qual???
	Extra string        // TODO: parse to map or slice w/ index embedded in header???
	// contains filtered or unexported fields
}

Sam stores fields of a sam record in the corresponding data type denoted in the sam file specifications. The cigar and sequence fields are further parsed into a more complex data structure to facilitate ease of use.

func ReadNext ¶

func ReadNext(reader *fileio.EasyReader) (Sam, bool)

ReadNext takes an EasyReader and returns the next Sam record as well as a boolean flag indicating if the file is finished being read. If there is a Sam record to process the function will return the Sam record and 'false'. After processing all Sam records in the file, the function will return a blank Sam and 'true'. ReadNext will advance the reader past all header lines beginning with '@'.

func SeekBamRegion ¶

func SeekBamRegion(br *BamReader, bai Bai, chrom string, start, end uint32) []Sam

SeekBamRegion returns a slice of reads that overlap the input region. SeekBamRegion will advance the reader as necessary so beware continuing linear reading after a call to SeekBamRegion.

func SeekBamRegionRecycle ¶

func SeekBamRegionRecycle(br *BamReader, bai Bai, chrom string, start, end uint32, recycledSams []Sam) []Sam

SeekBamRegionRecycle functions identially to SeekBamRegion, but inputs a slice of Sam structs that which will be reused. Avoids excessive memory allocations on repeat seek calls.

func (Sam) GetChrom ¶

func (s Sam) GetChrom() string

func (Sam) GetChromEnd ¶

func (s Sam) GetChromEnd() int

func (Sam) GetChromStart ¶

func (s Sam) GetChromStart() int

func (Sam) String ¶

func (s Sam) String() string

String converts Sam to a string to satisfy the fmt.Stringer interface.

func (Sam) UpdateLift ¶

func (s Sam) UpdateLift(c string, start int, end int)

func (Sam) WriteToFileHandle ¶

func (s Sam) WriteToFileHandle(file io.Writer)

type SamPE ¶ added in v1.0.1

type SamPE struct {
	R1    Sam
	R2    Sam
	SupR1 []Sam
	SupR2 []Sam
}

SamPE is struct that contains 2 paired end Sam entries as well as a slice of Sam that represents all supplementary alignments for the read-pair

type SingleCellAlignment ¶

type SingleCellAlignment struct {
	Aln Sam
	Bx  []dna.Base
	Umi []dna.Base
}

SingleCellAlignment includes a Sam struct along with the parsed barcode and UMI for single-cell reads.

func ToSingleCellAlignment ¶

func ToSingleCellAlignment(s Sam) SingleCellAlignment

ToSingleCellAlignment parses the barcode and UMI from the QNAME field of an input sam from a single-cell read formatted with fastqFormat -singleCell.

type SortOrder ¶

type SortOrder string

Sort order defines whether the file is sorted and if so, how it was sorted.

const (
	Unknown         SortOrder = "unknown"
	Unsorted        SortOrder = "unsorted"
	QueryName       SortOrder = "queryname"
	Coordinate      SortOrder = "coordinate"
	Lexicographical SortOrder = "lexicographical"
	Natural         SortOrder = "natural"
	Umi             SortOrder = "umi"
)

type Tag ¶

type Tag [2]byte

Tag is a 2 byte identifier of data encoded in a sam header. Tags occur in sets where each header line begins with a tag which is further split into subtags. e.g. @SQ SN:ref LN:45.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL