ivc

package module
v0.9.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 14, 2016 License: GPL-2.0 Imports: 16 Imported by: 3

README

IVC - An Integrated Variant Caller

  1. Overview

IVC is a tool for calling genomic variants from next-generation sequencing data. The tool is developed based on a novel approach to variant calling which leverages existing genetic variants to improve the accuracy of called variants, including new variants and hard-to-detect INDELs. By design, IVC integrates read alignment, alignment sorting, and variant calling into a unified process. The simplified workflow eliminates many intermediate steps and consequently reduces human intervention and errors.

IVC is written in Go programming language (see https://golang.org). It currently supports Illumina paired-end reads. Other data formats will be supported soon.

  1. Install IVC

2.1 Download IVC source code with Go

Pre-requirement: Go environment is already set up properly.

Get IVC source code:

go get github.com/namsyvo/IVC

After these steps, IVC source code should be in the directory $GOPATH/github.com/namsyvo/IVC
Then go to the IVC directory, from which IVC can be run as a Go program:

cd $GOPATH/src/github.com/namsyvo/IVC
2.2 Download IVC source code without Go

Get IVC source code with pre-compiled binary executable files of IVC (compiled on GNU/Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u3 x86_64):

git clone https://github.com/namsyvo/IVC.git
cd IVC

Those binary executable files were obtained by compiling source code with Go:

go build main/ivc-index.go 
go build main/ivc.go

The source code can be also downloaded from releases of IVC at https://github.com/namsyvo/IVC/releases

  1. Usage

3.1 Example command

IVC comes with a test dataset which includes the following directories:
./test_data/refs: includes a reference genome and a corresponding variant profile for NC_007194.1 (Aspergillus fumigatus Af293 chromosome 1, whole genome shotgun sequence, see http://www.ncbi.nlm.nih.gov/nuccore/AAHF00000000)
./test_data/reads: includes a set of 10.000 simulated paired-end reads generated with DWGSIM (see https://github.com/nh13/DWGSIM).

3.1.1. Creating and indexing reference genomes with variant profile:

go run main/ivc-index.go -R test_data/refs/chr1_ref.fasta -V test_data/refs/chr1_variant_prof.vcf -I test_data/indexes

The command "go run main/ivc-index.go" can be replaced by the command "./ivc-index.go".

3.1.2. Calling variants from reads and the reference

go run main/ivc.go -R test_data/refs/chr1_ref.fasta -V test_data/refs/chr1_variant_prof.vcf -I test_data/indexes -1 test_data/reads/chr1_dwgsim_100_0.001-0.01.bwa.read1.fastq -2 test_data/reads/chr1_dwgsim_100_0.001-0.01.bwa.read2.fastq -O test_data/results/chr1_variant_calls.vcf

The command "go run main/ivc.go" can be replaced by the command "./ivc.go".

3.2 Commands and options

3.2.1. Creating and indexing reference genomes with variant profile:

Required:

-R: reference genome (FASTA format).  
-V: known variant profile (VCF format).  
-I: directory for storing index.  

Options:

3.2.2. Calling Variants:

Required:

-R: reference genome (FASTA format).  
-V: known variant profile (VCF format).  
-I: directory for storing index.  
-1: the read file (for single-end reads) (FASTQ format).  
-2: the second end file (for pair-end reads) (FASTQ format).  
-O: variant call result file (VCF format).  

Options:

-d: threshold of alignment distances (float, default: determined by the program).  
-t: maximum number of CPUs to run (integer, default: number of CPU of running computer).  
-r: maximum number of iterations for random searching (int, default: determined by the program).  
-s: substitution cost (float, default: 4). 
-o: gap open cost (float, default: 4.1). 
-e: gap extension cost (float, default: 1.0). 
-mode: searching mode for finding seeds (1: random (default), 2: deterministic).  
-start: starting position on reads for finding seeds (integer, default: 0).  
-step: step for searching in deterministic mode (integer, default: 5).  
-maxs: maximum number of seeds for single-end reads (default: 1024).  
-maxp: maximum number of paired-seeds for paired-end reads (default: 128).  
-lmin: minimum length of seeds for each end (default: 15).  
-lmax: maximum length of seeds for each end (default: 30).  
-debug: debug mode (boolean, default: false)
  1. Preparing data and performing experiments

4.1 Simulated data

IVC comes with a simulator which simulates mutant genomes based on the reference genome and its associated variant profile. Reads are then can be generated from the mutant genome using other simulators, such as DWGSIM.

Get the mutant genome simulator:

git clone https://github.com/namsyvo/ivc-tools.git
cd ivc-tools/genome-simulator

Then follow the instructions to generate simulated mutant genomes and evaluate the called variants.

4.2 Real data
  1. Contact

Nam Sy Vo
nsvo1@memphis.edu

Documentation

Index

Constants

View Source
const (
	NEW_SNP_RATE   = 0.001  // probability of new alleles
	NEW_INDEL_RATE = 0.0001 // probability of new indels
	INDEL_ERR_RATE = 0.0001 // probability of indel error
)

-------------------------------------------------------------------------------------------------- Global constants --------------------------------------------------------------------------------------------------

Variables

View Source
var (
	PRINT_MEMSTATS = false

	PRINT_EDIT_DIST_INFO     = false
	PRINT_EDIT_DIST_MAT_INFO = false

	PRINT_VAR_CALL_INFO    = false
	PRINT_ALIGN_TRACE_INFO = false
	PRINT_UNALIGN_INFO     = false
)

Global variable for turnning on/off info profiling

View Source
var (
	CPU_FILE  *os.File
	MEM_FILE  *os.File
	MEM_STATS *runtime.MemStats
)

Global variable for cpu and memory profiling

View Source
var (
	PARA *ParaInfo        // all parameters of the program
	L2E  []float64        // indel error rate corresponding to lengths of indels
	Q2C  map[byte]float64 // alignment cost based on Phred-scale quality
	Q2E  map[byte]float64 // error probability based on Phred-scale quality
	Q2P  map[byte]float64 // non-error probability based on Phred-scale quality
	MUT  = &sync.Mutex{}  // mutex lock for reading/writing from/to the map of variant calls
)

-------------------------------------------------------------------------------------------------- Global variables for calculating variant quality. --------------------------------------------------------------------------------------------------

View Source
var UNALIGN_READ_INFO = make([]*UnAlnReadInfo, 0)

Printing unaligned-reads info

View Source
var VarCall []*VarProf // number of elements will be set equal to number of cores to run parallel updates

--------------------------------------------------------------------------------------------------- Set of variant calls, each element cover a certain region on the multigenome. ---------------------------------------------------------------------------------------------------

Functions

func AlignCostVarLoci added in v0.9.0

func AlignCostVarLoci(read, ref, qual []byte, prob float64) float64

------------------------------------------------------------------------------------------------- AlignCostVarLoci calculates cost of alignment between a read and the reference at known loci. -------------------------------------------------------------------------------------------------

func BuildMultiGenome added in v0.7.2

func BuildMultiGenome(genome_file, var_prof_file string, debug_mode bool) (chr_pos []int, chr_name [][]byte,
	seq []byte, var_prof map[string]map[int]VarProfInfo)

------------------------------------------------------------------------------------------------- BuildMultiGenome builds multi-sequence from a standard reference genome and a variant profile. -------------------------------------------------------------------------------------------------

func GetEditTrace added in v0.7.1

func GetEditTrace(mess string, i, j int, read, ref byte)

func GetEditTraceKnownLoc added in v0.7.1

func GetEditTraceKnownLoc(mess string, i, j int, read []byte, ref byte)

func GetGenome added in v0.8.0

func GetGenome(file_name string) (chr_pos []int, chr_name [][]byte, seq []byte)

-------------------------------------------------------------------------------------------------- GetGenome gets reference genome from FASTA files. --------------------------------------------------------------------------------------------------

func GetVarProfInfo added in v0.8.1

func GetVarProfInfo(file_name string) map[string]map[int]VarProfInfo

-------------------------------------------------------------------------------------------------- GetVarProfInfo gets variant profile from VCF files. --------------------------------------------------------------------------------------------------

func IndexN added in v0.3.1

func IndexN(s, sep []byte, n int) int

-------------------------------------------------------------------------------------------------- IndexN returns index of a pattern in a slice of bytes. --------------------------------------------------------------------------------------------------

func InitEditAlnMat added in v0.7.2

func InitEditAlnMat(arr_len int) ([][]float64, [][][]int)

-------------------------------------------------------------------------------------------------- InitEditAlnMat initializes variables for computing distance and alignment between reads and multi-genomes. --------------------------------------------------------------------------------------------------

func InitTraceKMat added in v0.9.0

func InitTraceKMat(arr_len int) [][][]byte

-------------------------------------------------------------------------------------------------- InitEditAlnMat initializes variables for computing distance and alignment between reads and multi-genomes. --------------------------------------------------------------------------------------------------

func IntervalHasVariants added in v0.7.2

func IntervalHasVariants(A []int, i, j int) bool

-------------------------------------------------------------------------------------------------- IntervalHasVariants determines whether [i, j] contains variant positions which are stores in array A. This function implements interpolation search. The array A must be sorted in increasing order. --------------------------------------------------------------------------------------------------

func LoadMultiSeq added in v0.7.2

func LoadMultiSeq(file_name string) (chr_pos []int, chr_name [][]byte, multi_seq []byte)

------------------------------------------------------------------------------------------------- LoadMultiSeq loads multi-sequence from file. -------------------------------------------------------------------------------------------------

func LoadVarProf added in v0.7.2

func LoadVarProf(file_name string) (variant map[int][][]byte, af map[int][]float32)

------------------------------------------------------------------------------------------------- LoadVarProf loads variant profile from file and return a map of variants. -------------------------------------------------------------------------------------------------

func PrintComparedReadRef added in v0.7.1

func PrintComparedReadRef(l_read_flank, l_ref_flank, r_read_flank, r_ref_flank []byte)

func PrintDisInfo added in v0.7.1

func PrintDisInfo(mess string, i, j int, d float64)

func PrintEditAlignInfo added in v0.7.0

func PrintEditAlignInfo(mess string, aligned_read, aligned_qual, aligned_ref []byte)

func PrintEditDisInput added in v0.7.0

func PrintEditDisInput(mess string, pos int, str_val ...[]byte)

func PrintEditDisMat added in v0.7.0

func PrintEditDisMat(mess string, D [][]float64, m, n int, read, ref []byte)

func PrintEditTraceMat added in v0.7.0

func PrintEditTraceMat(mess string, BT [][][]int, m, n int)

BT[i][j][0]: direction, can be 0: diagonal arrow (back to i-1,j-1), 1: up arrow (back to i-1,j), 2: left arrow (back to i,j-1) BT[i][j][1]: matrix, can be 0: matrix for D, 1: matrix for IS, 2: matrix for IT BT[i][j][2]: number of shift (equal to length of called variant) at known variant loc, can be any integer number, for example 5 means back to i-5,j-1

func PrintExtendTraceInfo added in v0.5.0

func PrintExtendTraceInfo(mess string, match []byte, e_pos, s_pos, match_num int, match_pos []int)

func PrintGetVariants added in v0.7.2

func PrintGetVariants(mess string, paired_prob, prob1, prob2 float64, vars1, vars2 []*VarInfo)

func PrintLoopTraceInfo added in v0.5.0

func PrintLoopTraceInfo(loop_num int, mess string)

func PrintMatchTraceInfo added in v0.5.0

func PrintMatchTraceInfo(pos, left_most_pos int, dis float64, left_var_pos []int, read []byte)

func PrintMemStats added in v0.4.0

func PrintMemStats(mesg string)

Printing memory information

func PrintPairedSeedInfo added in v0.5.1

func PrintPairedSeedInfo(mess string, match_pos_r1, match_pos_r2 int)

func PrintRefPosMap added in v0.7.2

func PrintRefPosMap(l_ref_pos_map, r_ref_pos_map []int)

func PrintSeedTraceInfo added in v0.5.0

func PrintSeedTraceInfo(mess string, e_pos, s_pos int, read []byte)

func PrintVarInfo added in v0.7.1

func PrintVarInfo(mess string, var_pos []int, var_val, var_qlt [][]byte)

func ProcessNoAlignReadInfo added in v0.5.1

func ProcessNoAlignReadInfo()

func RevComp added in v0.2.5

func RevComp(read, qual []byte, rev_comp_read, rev_qual []byte)

-------------------------------------------------------------------------------------------------- RevComp computes reverse, reverse complement, and complement of a read. --------------------------------------------------------------------------------------------------

func SaveMultiSeq added in v0.7.2

func SaveMultiSeq(file_name string, chr_pos []int, chr_name [][]byte, multi_seq []byte)

------------------------------------------------------------------------------------------------- SaveMultiSeq saves multi-sequence to file. -------------------------------------------------------------------------------------------------

func SaveVarProf added in v0.7.2

func SaveVarProf(file_name string, chr_pos []int, chr_name [][]byte, var_prof map[string]map[int]VarProfInfo)

------------------------------------------------------------------------------------------------- SaveVarProf saves variant profile to file. -------------------------------------------------------------------------------------------------

func Setup added in v0.7.2

func Setup(input_para *ParaInfo)

-------------------------------------------------------------------------------------------------- Read input information and set up parameters --------------------------------------------------------------------------------------------------

func SplitN added in v0.3.1

func SplitN(s, sep []byte, n int) ([][]byte, int)

-------------------------------------------------------------------------------------------------- SplitN splits a slice of bytes using a memory-efficient method. --------------------------------------------------------------------------------------------------

Types

type EditAlnInfo added in v0.7.2

type EditAlnInfo struct {
	// contains filtered or unexported fields
}

-------------------------------------------------------------------------------------------------- Alignment information, served as shared variables between functions for alignment process --------------------------------------------------------------------------------------------------

func InitEditAlnInfo added in v0.7.2

func InitEditAlnInfo(arr_len int) *EditAlnInfo

-------------------------------------------------------------------------------------------------- InitEditAlnInfo allocates memory for share variables for alignment process --------------------------------------------------------------------------------------------------

type ParaInfo added in v0.2.5

type ParaInfo struct {
	//Input file names:
	Ref_file       string // reference multigenome
	Var_prof_file  string // variant profile
	Index_file     string // index of original reference genomes
	Rev_index_file string // index of reverse reference genomes
	Read_file_1    string // first end of read
	Read_file_2    string // second end of read
	Var_call_file  string // store Var call

	// Input paras:
	Search_mode int     // searching mode for finding seeds
	Start_pos   int     // starting postion on reads for finding seeds
	Search_step int     // step for searching in deterministic mode
	Max_snum    int     // maximum number of seeds
	Max_psnum   int     // maximum number of paired-seeds
	Min_slen    int     // minimum length of seeds
	Max_slen    int     // maximum length of seeds
	Dist_thres  float64 // threshold for distances between reads and multigenomes
	Iter_num    int     // number of random iterations to find proper alignments
	Sub_cost    float64 // cost of substitution for Hamming and Edit distance
	Gap_open    float64 // cost of gap open for Edit distance
	Gap_ext     float64 // cost of gap extension for Edit distance
	Proc_num    int     // maximum number of CPUs using by Go
	Debug_mode  bool    // debug mode for output

	// Estimated paras:
	Read_len        int     // read length, calculated from read files
	Info_len        int     // maximum size of array to store read headers
	Max_ins         int     // maximum insert size of two aligned ends
	Err_rate        float32 // average sequencing error rate, estmated from reads with real reads
	Err_var_factor  int     // factor for standard variation of sequencing error rate
	Mut_rate        float32 // average mutation rate, estmated from reference genome
	Mut_var_factor  int     // factor for standard variation of mutation rate
	Iter_num_factor int     // factor for number of iterations
	Seed_backup     int     // number of backup bases from seeds
	Ham_backup      int     // number of backup bases from Hamming alignment
	Indel_backup    int     // number of backup bases from known indels
}

-------------------------------------------------------------------------------------------------- Parameter information --------------------------------------------------------------------------------------------------

func SetupPara added in v0.7.2

func SetupPara(input_para *ParaInfo) *ParaInfo

-------------------------------------------------------------------------------------------------- SetupPara setups values of parameters for alignment process --------------------------------------------------------------------------------------------------

type ReadInfo added in v0.2.5

type ReadInfo struct {
	Read1, Read2                   []byte // first and second ends
	Qual1, Qual2                   []byte // quality info of the first read and second ends
	Rev_read1, Rev_read2           []byte // reverse of the first and second ends
	Rev_comp_read1, Rev_comp_read2 []byte // reverse complement of the first and second ends
	Comp_read1, Comp_read2         []byte // complement of the first and second ends
	Rev_qual1, Rev_qual2           []byte // quality of reverse of the first and second ends
	Info1, Info2                   []byte // info of the first and second ends
}

-------------------------------------------------------------------------------------------------- Information of input reads --------------------------------------------------------------------------------------------------

func InitReadInfo added in v0.5.0

func InitReadInfo(read_len, info_len int) *ReadInfo

-------------------------------------------------------------------------------------------------- InitReadInfo creates a ReadInfo object and initializes its content --------------------------------------------------------------------------------------------------

type SeedInfo added in v0.7.2

type SeedInfo struct {
	// contains filtered or unexported fields
}

--------------------------------------------------------------------------------------------------- Information of seeds between reads and the multigenome. ---------------------------------------------------------------------------------------------------

type UnAlnInfo added in v0.7.2

type UnAlnInfo struct {
	// contains filtered or unexported fields
}

--------------------------------------------------------------------------------------------------- Information of unaligned reads. ---------------------------------------------------------------------------------------------------

type UnAlnReadInfo added in v0.8.1

type UnAlnReadInfo struct {
	// contains filtered or unexported fields
}

--------------------------------------------------------------------------------------------------- UnAlnReadInfo represents information of unaligned-reads, which serves as temporary variables. ---------------------------------------------------------------------------------------------------

type VarCallIndex added in v0.8.1

type VarCallIndex struct {
	Seq        []byte            // multi-sequence
	SeqLen     int               // length of multi-sequence
	ChrPos     []int             // position (first base) of the chromosome on whole-genome
	ChrName    [][]byte          // chromosome names
	Variants   map[int][][]byte  // variants (position, variants).
	VarAF      map[int][]float32 // allele frequency of variants (position, allele frequency)
	SameLenVar map[int]int       // indicate if variants has same length (SNPs or MNPs)
	DelVar     map[int]int       // length of deletions if variants are deletion
	RevFMI     *fmi.Index        // FM-index of reverse multi-sequence (to do forward search)
}

--------------------------------------------------------------------------------------------------- VarCallIndex represents preprocessed information of the reference genome and variant profile, includes an FM-index of (reverse of) the multigenome, which is used to speed up variant calling. This struct also consists of functions for calling variants. ---------------------------------------------------------------------------------------------------

func NewVariantCaller added in v0.7.2

func NewVariantCaller() *VarCallIndex

--------------------------------------------------------------------------------------------------- NewVariantCaller creates an instance of VarCallIndex and sets up its variables. This function will be called from the main program. ---------------------------------------------------------------------------------------------------

func (*VarCallIndex) CallVariants added in v0.8.1

func (VC *VarCallIndex) CallVariants()

--------------------------------------------------------------------------------------------------- CallVariants searches for variants and updates variant information in VarCallIndex. This function will be called from main program. ---------------------------------------------------------------------------------------------------

func (*VarCallIndex) ExtendSeeds added in v0.8.1

func (VC *VarCallIndex) ExtendSeeds(s_pos, e_pos, m_pos int, read, qual []byte, edit_aln_info_1, edit_aln_info_2 *EditAlnInfo) ([]*VarInfo, int, int, float64)

--------------------------------------------------------------------------------------------------- ExtendSeeds performs alignment between extensions from seeds on reads and multigenomes and determines variants from the alignment of both left and right extensions. ---------------------------------------------------------------------------------------------------

func (*VarCallIndex) ForwardSearchFrom added in v0.8.1

func (VC *VarCallIndex) ForwardSearchFrom(pattern []byte, s_pos int) (int, int, int)

-------------------------------------------------------------------------------------------------- ForwardSearchFrom searches for exact matches between a pattern and the reference using FM-index. It starts to search forwardly on the pattern from any position to match backwardly on the reference. --------------------------------------------------------------------------------------------------

func (*VarCallIndex) LeftAlign added in v0.8.1

func (VC *VarCallIndex) LeftAlign(read, qual, ref []byte, pos int, D, IS, IT [][]float64,
	BT_D, BT_IS, BT_IT [][][]int, BT_K [][][]byte, ref_pos_map []int, del_ref bool) (float64, float64,
	int, int, int, []int, [][]byte, [][]byte, []int)

------------------------------------------------------------------------------------------------- LeftAlign calculates the distance between a read and a ref in backward direction. The read include standard bases, the ref includes standard bases and "*" characters. -------------------------------------------------------------------------------------------------

func (*VarCallIndex) LeftAlignEditTraceBack added in v0.8.1

func (VC *VarCallIndex) LeftAlignEditTraceBack(read, qual, ref []byte, m, n int, pos int,
	BT_Mat int, BT_D, BT_IS, BT_IT [][][]int, BT_K [][][]byte, ref_pos_map []int, del_ref bool) ([]int, [][]byte, [][]byte, []int)

------------------------------------------------------------------------------------------------- LeftAlignEditTraceBack constructs alignment between a read and a ref from LeftAlign. The read includes standard bases, the ref include standard bases and "*" characters. -------------------------------------------------------------------------------------------------

func (*VarCallIndex) OutputVarCalls added in v0.8.1

func (VC *VarCallIndex) OutputVarCalls()

--------------------------------------------------------------------------------------------------- OutputVarCalls determines variant calls and writes them to file in VCF format. ---------------------------------------------------------------------------------------------------

func (*VarCallIndex) ReadReads added in v0.8.1

func (VC *VarCallIndex) ReadReads(read_data chan *ReadInfo, read_signal chan bool)

--------------------------------------------------------------------------------------------------- ReadReads reads all reads from input FASTQ files and put them into data channel. ---------------------------------------------------------------------------------------------------

func (*VarCallIndex) RightAlign added in v0.8.1

func (VC *VarCallIndex) RightAlign(read, qual, ref []byte, pos int, D, IS, IT [][]float64,
	BT_D, BT_IS, BT_IT [][][]int, BT_K [][][]byte, ref_pos_map []int, del_ref bool) (float64, float64,
	int, int, int, []int, [][]byte, [][]byte, []int)

------------------------------------------------------------------------------------------------- RightAlign calculates the distance between a read and a ref in forward direction. The read includes standard bases, the ref includes standard bases and "*" characters. -------------------------------------------------------------------------------------------------

func (*VarCallIndex) RightAlignEditTraceBack added in v0.8.1

func (VC *VarCallIndex) RightAlignEditTraceBack(read, qual, ref []byte, m, n int, pos int,
	BT_Mat int, BT_D, BT_IS, BT_IT [][][]int, BT_K [][][]byte, ref_pos_map []int, del_ref bool) ([]int, [][]byte, [][]byte, []int)

------------------------------------------------------------------------------------------------- RightAlignEditTraceBack constructs alignment between a read and a ref from RightAlign. The read includes standard bases, the ref include standard bases and "*" characters. -------------------------------------------------------------------------------------------------

func (*VarCallIndex) SearchSeeds added in v0.8.1

func (VC *VarCallIndex) SearchSeeds(read []byte, s_pos int, m_pos []int) (int, int, int, bool)

-------------------------------------------------------------------------------------------------- SearchSeeds returns positions and distances of seeds between a read and the reference. It searches forwardly on read to match backwardly on reverse of the reference. --------------------------------------------------------------------------------------------------

func (*VarCallIndex) SearchSeedsPE added in v0.8.1

func (VC *VarCallIndex) SearchSeedsPE(read_info *ReadInfo, seed_pos [][]int, rand_gen *rand.Rand) (*SeedInfo, *SeedInfo, bool)

--------------------------------------------------------------------------------------------------- SearchSeedsPE searches for all pairs of seeds which have proper chromosome distances. ---------------------------------------------------------------------------------------------------

func (*VarCallIndex) SearchVariants added in v0.8.1

func (VC *VarCallIndex) SearchVariants(read_data chan *ReadInfo, read_signal chan bool,
	var_info []chan *VarInfo, uar_info chan *UnAlnReadInfo, wg *sync.WaitGroup)

--------------------------------------------------------------------------------------------------- SearchVariants takes data from data channel, searches for variants and put them into results channel. ---------------------------------------------------------------------------------------------------

func (*VarCallIndex) SearchVariantsPE added in v0.8.1

func (VC *VarCallIndex) SearchVariantsPE(read_info *ReadInfo, edit_aln_info_1, edit_aln_info_2 *EditAlnInfo, seed_pos [][]int,
	rand_gen *rand.Rand, var_info []chan *VarInfo, uar_info chan *UnAlnReadInfo)

--------------------------------------------------------------------------------------------------- SearchVariantsPE searches for variants from alignment between pair-end reads and the multigenome. It uses seed-and-extend strategy and looks for the best alignment candidates through several iterations. ---------------------------------------------------------------------------------------------------

func (*VarCallIndex) UpdateVariantProb added in v0.8.1

func (VC *VarCallIndex) UpdateVariantProb(var_info *VarInfo)

--------------------------------------------------------------------------------------------------- UpdateVariantProb updates probablilities of variants at a variant location using Bayesian update. ---------------------------------------------------------------------------------------------------

type VarInfo added in v0.7.2

type VarInfo struct {
	Pos     uint32  // postion of variant (on the reference)
	Bases   []byte  // aligned bases to be the variant
	BQual   []byte  // quality sequences (in FASTQ format) of bases to be the variant
	Type    int     // type of the variant (0: sub, 1: ins, 2: del; other types will be considered in future)
	CDis    int     // chromosomal distance between alignment positions of two read-ends
	CDiff   int     // chromosomal distance between aligned pos and true pos
	MProb   float64 // probability of mapping read corectly (mapping quality)
	AProb   float64 // probability of aligning read correctly (alignment quality)
	IProb   float64 // probability of insert size to be correct (for pair-end reads)
	SPos1   int     // starting position on read1 of exact match (or ending position from backward search with FM-index)
	SPos2   int     // starting position on read2 of exact match (or ending position from backward search with FM-index)
	Strand1 bool    // strand (backward/forward) of read1 of exact match
	Strand2 bool    // strand (backward/forward) of read2 of exact match
	RInfo   []byte  // information sequences (in FASTQ format) of aligned reads (header of reads in FASTQ format)
}

--------------------------------------------------------------------------------------------------- VarInfo represents information of detected variants, which serves as temporary variables. ---------------------------------------------------------------------------------------------------

type VarProf added in v0.7.2

type VarProf struct {
	// VarProb stores all possible variants at each position and their confident probablilities.
	// Prior probablities will be obtained from reference genomes and variant profiles.
	// Posterior probabilities will be updated during alignment phase based on incomming aligned bases
	VarProb   map[uint32]map[string]float64   // probability of the variant call
	VarType   map[uint32]map[string]int       // pype of variants (0: sub, 1: ins, 2: del; other types will be considered in future)
	VarRNum   map[uint32]map[string]int       // numer of aligned reads corresponding to each variant
	ChrDis    map[uint32]map[string][]int     // chromosomal distance between two aligned read-ends
	ChrDiff   map[uint32]map[string][]int     // chromosomal distance betwwen the aligned postion and true postion (for simulated data)
	MapProb   map[uint32]map[string][]float64 // probability of mapping read to be corect (mapping quality)
	AlnProb   map[uint32]map[string][]float64 // probability of aligning read to be correct (alignment quality)
	ChrProb   map[uint32]map[string][]float64 // probability of insert size to be correct (for pair-end reads)
	StartPos1 map[uint32]map[string][]int     // start position (on read) of alignment of the first end
	StartPos2 map[uint32]map[string][]int     // start position (on read) of alignment of the second end
	Strand1   map[uint32]map[string][]bool    // strand indicator of the first end ("true" if read has same strand with ref, "false" otherwise)
	Strand2   map[uint32]map[string][]bool    // strand indicator of the second end ("true" if read has same strand with ref, "false" otherwise)
	VarBQual  map[uint32]map[string][][]byte  // quality sequences (in FASTQ format) of aligned bases at the variant call position
	ReadInfo  map[uint32]map[string][][]byte  // information sequences (in FASTQ format) of aligned reads (header of reads in FASTQ format)
}

-------------------------------------------------------------------------------------------------- VarProf represents variant profile and related info of the individual genome. --------------------------------------------------------------------------------------------------

type VarProfInfo added in v0.8.1

type VarProfInfo struct {
	Variant [][]byte
	AleFreq []float32
}

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL