godon

module
v0.0.0-...-eeaf01f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 13, 2021 License: GPL-2.0

README

Go Report Card GoDoc

Godon is codon models software written in Go.

Godon development was supported Swiss National Science Foundation (grant numbers CR32I3_143768, IZLRZ3_163872).

What is special about Godon

  • Godon supports rate variation (see the manuscript). There are three models which support rate variation: branch-site (model BSG), M8 (model M8) and M0 (model M0G). You need to specify the number of discrete categories. Otherwise, there will be no rate variation in the model. Use --ncat-site-rate or --ncat-codon-rate for site rate variation and codon rate variation respectively.

  • Godon supports state aggregation (option --aggregate). See the paper for the details. For the paper, we used v0.5 (39bf774). Since then likelihood computations code were substantially changed.

  • A heuristic to avoid LRT statistics overestimation, which often causes false positives in PAML. Also corrects for LRT underestimation. Use godon test to enable.

  • A heuristic for fast branch-length estimation via M0 (--m0-tree).

  • Multiple optimizers available: L-BFGS-B, downhill simplex, simulated annealing, SQP, and others via NLopt.

  • Markov chain Monte Carlo support (Metropolis-Hastings algorithm).

  • Export to machine-readable JSON format.

  • Multithreading support (unlike PAML).

  • Starting point specification (only some parameters in PAML) and randomization (disabled in PAML).

  • Testing multiple branches in one run for the branch-site model.

  • Wide range of models: M0, M1a, M2a, M7, M8, and branch-site.

  • Support for various genetic codes.

  • Checkpoints: in case your long computation was interrupted it is possible to continue. You need to specify checkpoint file to use this (--checkpoint). Warning: this might affect reproducibility when it comes to random number generator.

Support

You can ask questions at the bioinformatics stackexchange site. Do not forget to use the [godon] tag. Use issues to report bugs.

Installation

The software was tested on GNU/Linux and Mac OS X.

You can fetch the latest statically compiled binary for GNU/Linux from the downloads section; do not forget to make it executable prior to running (chmod +x godon-master-linux-gnu-x86_64).

Compiling from source

Requirements:

Once you have got all of that you can run:

$ bin/install.sh

Ubuntu 16.04 installation

  1. Install Go v1.7 or later. You can start by installing Go v1.6 and then updating using godeb.

  2. Install dependencies: sudo apt-get install git libnlopt-dev libopenblas-dev build-essentials gfortran

  3. Install Gonum BLAS: CGO_LDFLAGS="-lopenblas" go install github.com/gonum/blas/cgo

  4. (Optional) If your Go is older than v1.7 install go-lbfsg.

  5. Install godon:

    bin/install.sh
    

Mac OS X installation using Homebrew

  1. Install Homebrew.
  2. Install dependentices: brew install go gcc nlopt (may take more than an hour).
  3. If you don't have git, install it as well: brew install git.
  4. Install godon: curl -L https://bitbucket.org/Davydov/godon/raw/master/bin/install.sh | CC=gcc-7 bash. You need to use gcc from Homebrew, in this case gcc-7.
  5. (Optional) Add the binary directory of Go to the PATH variable. E.g., put export PATH=$PATH:$HOME/go/bin into your ~/.bash_profile.

Mac OS X installation (old)

  1. Make sure you have C compiler, build tools and gfortran.

  2. Install Go (1.7 or later).

  3. Install NLopt.

  4. Get Godon source code with go get -d bitbucket.org/Davydov/godon/godon.

  5. Install godon. Depending on the installation, you may need to specify paths to nlopt library and include files and to the fortran library libgfortran (on the test system it was /usr/local/Cellar/gcc/6.2.0/lib/gcc/6). Run:

    CGO_CFLAGS="-I/path/to/nlopt/include" CGO_LDFLAGS="-L/path/to/libgfortran -L/path/to/nlopt/lib" $GOPATH/src/bitbucket.org/Davydov/godon/bin/install.sh
    

Running

Don't forget to check out the tutorial.

You can find sample datasets in godon/cmodel/testdata.

You can tell Godon to run a pair of models (M8 vs. M8a or branch-site H1 vs. H0). In this case, if the foreground branch for the branch-site model is not labeled with #1, Godon will test all the branches. To force this behavior even in the presence of #1 labeled branch, use --all-branches. You can exclude terminal branches with --no-leaves. You can use branch lengths estimated with M0 using --m0-tree.

#!bash
$ godon test BS --m0-tree --all-branches EMGT00050000008747.Drosophila.002.fst EMGT00050000008747.Drosophila.002.nwk

Perform likelihood maximization using L-BFGS-B optimizer for the Branch-Site model without optimizing the branch lengths (use only a single CPU).

#!bash
$ godon -p 1 -n BS EMGT00050000000025.Drosophila.001.fst EMGT00050000000025.Drosophila.001.nwk

Run MCMC using M0 model with the downhill simplex optimization.

#!bash
$ godon -m mh M0 EMGT00050000000025.Drosophila.001.fst EMGT00050000000025.Drosophila.001.nwk

repository contents

  • bin installation script
  • bio reads fasta and translates genetic code
  • cmodel codon models
  • codon working with codon and transition matrices
  • godon is MCMC sampler/maximum likelihood for M0 and branchsite model
  • misc various utilities
  • optimize is the MCMC & downhill simplex and other algorithms implementation
  • dist functions related to discrete distributions, initially ported from PAML
  • tree is tree manipulation library

codon

  • codon_frequency.go — F0, F3X4
  • codon_sequences.go — codon alignment class
  • ematrix.go — matrix class which remembers its eigen decomposition
  • matrix.go — transition matrix routines

cmodel

  • aggregation.go — codon aggregation code
  • branch_site.go — branch site model
  • M0.go — M0 model
  • model.go — tree + alignment model base class
  • tools.go — misc helper functions
cmodel tests
  • likelihood_test.go — likelihood test (compare with codeml)
  • mcmc_test.go — MCMC benchmark
  • mcmcpar_test.go — test that likelihood is consistent during chain evaluation

optimize

  • adaptive.go — adaptive parameter class
  • lbfgsb.go — L-BFGS-B optimizer
  • mh.go — metropolis hastings & simulated annealing implementations
  • nlopt_callback.go — NLopt callback wrapper
  • nlopt.go — NLopt wrapper
  • optimizer.go — Optimizer and Optimizable intefaces
  • parameter.go — float64 parameter class
  • prior.go — prior functions
  • proposal.go — proposal functions
  • simplex.go — simplex method
  • utils.go — helper functions

misc

  • brexp exports branch lengths and node labels in various formats
  • brmatch matches branch labels between two trees
  • norm is a sampler for multiple normal distributions model

Directories

Path Synopsis
Package bio provides functions related to the genetic code.
Package bio provides functions related to the genetic code.
checkpoint creates CheckpointIO which provides various operations with checkpoints.
checkpoint creates CheckpointIO which provides various operations with checkpoints.
Package cmodel provides codon evolution models.
Package cmodel provides codon evolution models.
Package codon is the package for working with codons, codon transition matrices and codon frequencies.
Package codon is the package for working with codons, codon transition matrices and codon frequencies.
Package dist implements functions for discrete distributions.
Package dist implements functions for discrete distributions.
Godon implements number of codon models (including M0 and branch-site).
Godon implements number of codon models (including M0 and branch-site).
misc
brexp
Brexp is a simple tool which helps working with trees in newick format.
Brexp is a simple tool which helps working with trees in newick format.
brmatch
Brmatch prints branch label (*label) correspondance between two topologically identical trees.
Brmatch prints branch label (*label) correspondance between two topologically identical trees.
dbeta
DBeta returns values of a discre beta distribution.
DBeta returns values of a discre beta distribution.
dgamma
DBeta returns values of a discre beta distribution.
DBeta returns values of a discre beta distribution.
gcode
gcode is a tool to generate files with genetic code in go format from asn1 file.
gcode is a tool to generate files with genetic code in go format from asn1 file.
norm
Norm is an optimizer for a normal distribution.
Norm is an optimizer for a normal distribution.
Package optimize is a collection of optimizers and MCMC samplers.
Package optimize is a collection of optimizers and MCMC samplers.
Package tree implements tree structure and newick parsing.
Package tree implements tree structure and newick parsing.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL