Documentation ¶
Overview ¶
Package gtf contains functions for reading, writing, and manipulating GTF format files. More information on the GTF file format can be found at http://genomewiki.ucsc.edu/index.php/Genes_in_gtf_or_gff_format Structs in the GTF package are organized hierarchically, with the gene struct containing the underlying transcripts, exons, and other gene features associated with that gene.
Index ¶
- func AllAreEqual(a map[string]*Gene, b map[string]*Gene) bool
- func CdnaLength(t *Transcript) int
- func CdsBoolArray(g map[string]*Gene, c map[string]*chromInfo.ChromInfo) map[string][]bool
- func CdsLength(t *Transcript) int
- func EqualCds(a *Cds, b *Cds) bool
- func EqualExon(a *Exon, b *Exon) bool
- func EqualFiveUtr(a *FiveUtr, b *FiveUtr) bool
- func EqualGene(a *Gene, b *Gene) bool
- func EqualThreeUtr(a *ThreeUtr, b *ThreeUtr) bool
- func EqualTranscript(a *Transcript, b *Transcript) bool
- func ExonBoolArray(g map[string]*Gene, c map[string]*chromInfo.ChromInfo) map[string][]bool
- func FilterVariantCds(v *vcf.Vcf, g map[string]*Gene, c map[string]*chromInfo.ChromInfo) bool
- func FilterVariantExon(v *vcf.Vcf, g map[string]*Gene, c map[string]*chromInfo.ChromInfo) bool
- func FilterVariantFiveUtr(v *vcf.Vcf, g map[string]*Gene, c map[string]*chromInfo.ChromInfo) bool
- func FilterVariantGtf(v *vcf.Vcf, g map[string]*Gene, c map[string]*chromInfo.ChromInfo, exon bool, ...) bool
- func FilterVariantThreeUtr(v *vcf.Vcf, g map[string]*Gene, c map[string]*chromInfo.ChromInfo) bool
- func FindPromoter(genes []string, upstream int, downstream int, gtf map[string]*Gene, ...) []bed.Bed
- func FiveUtrBoolArray(g map[string]*Gene, c map[string]*chromInfo.ChromInfo) map[string][]bool
- func GeneToCanonicalBed(g Gene, c map[string]chromInfo.ChromInfo, upstream int, downstream int) bed.Bed
- func GeneToCanonicalTssBed(g Gene, c map[string]chromInfo.ChromInfo) bed.Bed
- func GeneToPromoterBed(g Gene, c map[string]chromInfo.ChromInfo, upstream int, downstream int) []bed.Bed
- func GeneToTssBed(g Gene, c map[string]chromInfo.ChromInfo) []bed.Bed
- func GenesToBedFirstTwoCodonBases(genes map[string]*Gene) []bed.Bed
- func GenesToCanonicalBeds(g map[string]*Gene, c map[string]chromInfo.ChromInfo, upstream int, ...) []bed.Bed
- func GenesToCanonicalTranscriptsTssBed(g map[string]*Gene, c map[string]chromInfo.ChromInfo) []bed.Bed
- func GenesToIntervalTree(genes map[string]*Gene) map[string]*interval.IntervalNode
- func GenesToPromoterBed(g map[string]*Gene, c map[string]chromInfo.ChromInfo, upstream int, ...) []bed.Bed
- func GenesToTssBed(g map[string]*Gene, c map[string]chromInfo.ChromInfo, merge bool) []bed.Bed
- func MoveAllCanonicalToZero(m map[string]*Gene)
- func MoveCanonicalToZero(g *Gene)
- func Read(filename string) map[string]*Gene
- func SortAllTranscripts(m map[string]*Gene)
- func SortTranscripts(g *Gene)
- func ThreeUtrBoolArray(g map[string]*Gene, c map[string]*chromInfo.ChromInfo) map[string][]bool
- func VariantArrayOverlap(v *vcf.Vcf, a map[string][]bool) bool
- func VariantToAnnotation(variant *vcfEffectPrediction, seq map[string][]dna.Base) string
- func VcfToVariant(v vcf.Vcf, tree map[string]*interval.IntervalNode, seq map[string][]dna.Base, ...) (*vcfEffectPrediction, error)
- func Write(filename string, records map[string]*Gene)
- func WriteToFileHandle(file io.Writer, gene *Gene)
- type Cds
- type Exon
- type FiveUtr
- type Gene
- type ThreeUtr
- type Transcript
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func AllAreEqual ¶
AllAreEqual returns true if all of the entries in a GTF map contain the same information, false otherwise.
func CdnaLength ¶
func CdnaLength(t *Transcript) int
CdnaLength returns the length of the cDNA in nucleotides.
func CdsBoolArray ¶
CdsBoolArray returns a map of chromosome names to bool slices. The bool is true if that position lies in a cds (protein-coding) region. A map of chromosome name to the information for that chromosome is needed to know the length of the retuned bool slices.
func CdsLength ¶
func CdsLength(t *Transcript) int
CdsLength returns the length of the Cds in nucleotides (after splicing).
func EqualCds ¶
EqualGene returns true if two input Cds structs contain the same information, false otherwise.
func EqualExon ¶
EqualGene returns true if two input Exon structs contain the same information, false otherwise.
func EqualFiveUtr ¶
EqualGene returns true if two input FiveUtr structs contain the same information, false otherwise.
func EqualGene ¶
EqualGene returns true if two input Gene structs contain the same information, false otherwise.
func EqualThreeUtr ¶
EqualGene returns true if two input ThreeUtr structs contain the same information, false otherwise.
func EqualTranscript ¶
func EqualTranscript(a *Transcript, b *Transcript) bool
EqualGene returns true if two input Transcript structs contain the same information, false otherwise.
func ExonBoolArray ¶
ExonBoolArray returns a map of chromosome names to bool slices. The bool is true if that position lies in an exon. A map of chromosome name to the information for that chromosome is needed to know the length of the retuned bool slices.
func FilterVariantCds ¶
FilterVariantCds take a vcf record, a gene list from a gtf, and ChromInfo to know the length of chromosomes. The function returns true if the vcf record overlaps a cds (protein-coding sequence) in the gtf.
func FilterVariantExon ¶
FilterVariantExon take a vcf record, a gene list from a gtf, and ChromInfo to know the length of chromosomes. The function returns true if the vcf record overlaps an exon in the gtf.
func FilterVariantFiveUtr ¶
FilterVariantFiveUtr take a vcf record, a gene list from a gtf, and ChromInfo to know the length of chromosomes. The function returns true if the vcf record overlaps a 5' UTR in the gtf.
func FilterVariantGtf ¶
func FilterVariantGtf(v *vcf.Vcf, g map[string]*Gene, c map[string]*chromInfo.ChromInfo, exon bool, code bool, five bool, three bool) bool
FilterVariantGtf takes a vcf record, a gene list from a gtf, ChromInfo to know the length of chromosomes, and whether the function should search for exon, coding, 5' UTR, or 3' UTR overlaps. If more than one type of overlap is selected by setting the multiple of: exon, cds, five, three to true, the function returns the logical or of whether the vcf record overlaps that function.
func FilterVariantThreeUtr ¶
FilterVariantThreeUtr take a vcf record, a gene list from a gtf, and ChromInfo to know the length of chromosomes. The function returns true if the vcf record overlaps a 3' UTR in the gtf.
func FindPromoter ¶ added in v1.0.1
func FiveUtrBoolArray ¶
FiveUtrBoolArray returns a map of chromosome names to bool slices. The bool is true if that position lies in a 5' UTR. A map of chromosome name to the information for that chromosome is needed to know the length of the retuned bool slices.
func GeneToCanonicalBed ¶
func GeneToCanonicalBed(g Gene, c map[string]chromInfo.ChromInfo, upstream int, downstream int) bed.Bed
GeneToCanonicalBed converts a Gene struct into a bed representing the promoter region of the canonical transcript. The user species the bases upstream and downstream of the TSS which will define the promoter region.
func GeneToCanonicalTssBed ¶
GeneToCanonicalTssBed converts a single Gene struct into a Bed representing the TSS position of the canonical transcript.
func GeneToPromoterBed ¶
func GeneToPromoterBed(g Gene, c map[string]chromInfo.ChromInfo, upstream int, downstream int) []bed.Bed
GeneToPromoterBed produces a slice of beds from a gene containing the positions of promoters (TSS-500bp -> TSS+2kb) for all transcripts of the gene with the geneName in the Name field of the output Bed entries.
func GeneToTssBed ¶
GeneToTssBed returns the positions of all TSSs from a Gene as a slice of single base-pair bed entries.
func GenesToBedFirstTwoCodonBases ¶ added in v1.0.1
GenesToBedFirstTwoCodonBases takes a map[string[*Gene
func GenesToCanonicalBeds ¶
func GenesToCanonicalBeds(g map[string]*Gene, c map[string]chromInfo.ChromInfo, upstream int, downstream int) []bed.Bed
GenesToCanonicalBeds converts all genes in a map[string]*Gene to a []bed.Bed, where each bed represents the promoter region of the canonical transcript, defined by user-specified upstream and downstream distances from the TSS.
func GenesToCanonicalTranscriptsTssBed ¶
func GenesToCanonicalTranscriptsTssBed(g map[string]*Gene, c map[string]chromInfo.ChromInfo) []bed.Bed
GenesToCanonicalTranscriptsTssBed turns an input map of [geneId]*Gene structs, finds the canonical transcript (defined as the longest coding sequence), and turns the TSS of this trancript into a Bed struct.
func GenesToIntervalTree ¶
func GenesToIntervalTree(genes map[string]*Gene) map[string]*interval.IntervalNode
GenesToIntervalTree builds a fractionally cascaded 2d interval tree for efficiently identifying genes that overlap a variant.
func GenesToPromoterBed ¶
func GenesToPromoterBed(g map[string]*Gene, c map[string]chromInfo.ChromInfo, upstream int, downstream int) []bed.Bed
GenesToPromoterBed produces a slice of beds from a set of genes containing the positions of all promoters for all transcripts for all genes with the geneID in the Name field of the output Bed entries.
func GenesToTssBed ¶
GenesToTssBed returns the position of all TSSs from a Gene map as a slice of single base-pair bed entries.
func MoveAllCanonicalToZero ¶
MoveAllCanonicalToZero applies MoveCanonicalToZero to every value in the map
func MoveCanonicalToZero ¶
func MoveCanonicalToZero(g *Gene)
MoveCanonicalToZero does a single iteration of bubble sort to move the longest/canonical transcript to the first position in the slice. This is faster than SortTranscripts.
func SortAllTranscripts ¶
SortAllTranscripts applies SortTranscripts to every value in the map
func SortTranscripts ¶
func SortTranscripts(g *Gene)
SortTranscripts sorts the longest transcript to the front so that the canonical/longest transcript is always g.Transcripts[0].
func ThreeUtrBoolArray ¶
ThreeUtrBoolArray returns a map of chromosome names to bool slices. The bool is true if that position lies in a 3' UTR. A map of chromosome name to the information for that chromosome is needed to know the length of the retuned bool slices.
func VariantArrayOverlap ¶
VariantArrayOverlap takes a vcf record and a map of bool slices (chrom name maps to a bool for each base in that chrom). The bool slice encodes the presense/absense of some genomic feature and true is returned if the vcf position overlaps that feature.
func VariantToAnnotation ¶
VariantToAnnotation generates an annotation which can be appended to the INFO field of a VCF Annotation format is: GoEP= Genomic | Gene | cDNA | Protein | VariantType Genomic cDNA and Protein annotations are in HGVS variant nomenclature format https://varnomen.hgvs.org/ The sequence of the reference genome needs to be supplied as a map from chromosome name to chromosome sequence. TODO: Not sensitive to UTR splice junctions TODO: Remove reports of splice variants for terminal exons; NMD prediction?
func VcfToVariant ¶
func VcfToVariant(v vcf.Vcf, tree map[string]*interval.IntervalNode, seq map[string][]dna.Base, allTranscripts bool) (*vcfEffectPrediction, error)
VcfToVariant determines the effects of a variant on the cDNA and amino acid sequence by querying genes in the tree made by GenesToIntervalTree Note that if multiple genes are found to overlap a variant this function will return the variant based on the first queried gene and throw an error All bases in fasta record must be uppercase.
func WriteToFileHandle ¶
WriteToFileHandle is a helper function of Write that writes a single gene to an output file.
Types ¶
type Cds ¶
Cds contains the location and score information for Cds lines of a GTF file. Cds structs also point to the next and previous Cds in the transcript.
type Exon ¶
type Exon struct { Start int End int Score float64 ExonNumber string ExonID string Cds *Cds FiveUtr *FiveUtr ThreeUtr *ThreeUtr }
Exon contains information on the location, score, and relative order of exons in a GTF file.
type Gene ¶
type Gene struct { GeneID string GeneName string Transcripts []*Transcript }
Gene organizes all underlying data on a gene feature in a GTF file.
func (*Gene) GetChromEnd ¶
GetChromEnd returns the genomics coordinate where the gene ends.
func (*Gene) GetChromStart ¶
GetChromStart returns the genomic coordinate where the gene starts.
func (*Gene) WriteToFileHandle ¶
WriteToFileHandle writes the gene to an io.Writer