|
|
|
Home
| CV |
Databases
| IMEG
Seminars |
Journals |
|
|
|
GZ-GAMMA:
Estimation of the Expected Number of Substitutions at each Amino Acid (Nucleotide) Site and the Parameter for Rate Variation among Sites. |
|||
| (c) Copyright December, 1997 by Jianzhi Zhang and the Pennsylvania State University. Permission is granted to copy this document provided that no fee is charged for it and that this copyright notice is not removed. It is distributed free of charge by: | |||
|
Jianzhi Zhang Current Address: Associate Professor of Ecology and Evolutionary Biology University of Michigan Ann Arbor, MI |
Xun Gu Current Address: Department of Zoology/Genetics 332 Science II Hall Iowa State University Ames, IA 5001 E-mail: xgu@iastate.edu |
||
| Suggested citation:
Gu, X. and J. Zhang J (1997) A simple method for estimating the parameter of substitution rate variation among sites. Mol. Biol. Evol. 14:1106-1113 |
|||
| Introduction GZ-gamma is designed to estimate the expected number of substitutions of each amino acid (nucleotide) site, and the gamma shape parameter for the rate variation among sites, using a combination of ancestral sequence inference and maximum likelihood estimation when the phylogenetic relationships of these homologous sequences are known. This package contains two programs: gz-aa.exe for amino acid sequences, and gz-DNA.exe for DNA sequences, which are encoded in C language. The program can be used on IBM PC compatible computers with Window 95 and Window NT operating systems. |
|||
| Installation
First make sure that the diskette you have received contains the following files. gz-aa.c (source code) gz-DNA.c (source code) gz-aa.exe (executable file) gz-DNA.exe (executable file) jtt.pro (JTT substitution matrix, for amino acid sequences) atp6.aa (an example data file for amino acid sequences) cox1.dna (an example data file for DNA sequences) manual (this file) alpha (output file from running gz-aa.exe) To install GZ-gamma on your computer's hard disk drive ("C" drive given here, for example), you should create a directory where the files of this package will be present. To do this, type the following c:\md GZ-gamma (Enter) To copy the GZ-gamma files onto your hard disk drive, insert the floppy disk containing the programs into your floppy drive ("A" drive given here, for example). Then, enter the following command c:\copy a:*.* c:\GZ-gamma\*.* (Enter) |
|||
| Input
file To use the program, you need one input file containing the amino acid (or nucleotide) sequences and the tree topology of these sequences (see atp6.aa for an example). This file begins with two numbers: the number of sequences and the number of amino acid or nucleotide sites (sequence length). The second line will be the name of the first sequence, and the third line will be the first sequence, and so on. Each sequence should occupy a line without any interruption. Only the letters (capitalized) for the 20 amino acids (or 4 nucleotides) are allowed in the sequences. The gaps or any other symbols should have been already removed. The last line of the file is the tree topology of the sequences. The tree format is the same as that used in PHYLIP package (Felsenstein 1995). Note that the tree is unrooted, so trification rather than bification is required for the deepest branching node. For example, the topology of the following tree can be expressed by (((1,3),2),6,((4,7),(5,8)))
|
|||
| Computation Click the MS-DOS prompt in the window (Window 95 or Window NT), then for amino acid sequences, type c:\GZ-gamma\gz-aa filename or for DNA sequences, type c:\GZ-gamma\gz-DNA filename, where filename is the name of the data file. In the case of atp6.aa data, for example, type c:\GZ-gamma\gz-aa atp6.aa The detailed procedure for the computation has been described in Gu and Zhang (1997). First, the ancestral sequence for each node is inferred by a fast Bayesian approach developed by Zhang and Nei (1997); the JTT-f model of amino acid substitutions is used for amino acid sequences, and Kimura two-parameter model is used for DNA sequences. Second, the expected number of substitutions for each site is estimated by the maximum likelihood approach under the Poisson model for amino acids and Jukes-Cantor model for nucleotides. Third, the ML estimate of the gamma shape parameter (alpha) is obtained from the distribution of expected number of substitutions. Note that the parsimony estimate of the gamma shape parameter (alpha) is obtained from the distribution of minimum-required number of substitutions. |
|||
| Output file The output of the gz-aa.exe or gz-DNA.exe is given in the file named "alpha". The estimate for the gamma shape parameter (alpha) is presented in the first line. Since then, the first column (#) indicates the position numbers of amino acid (nucleotide) sites, the second column (m') presents the minimum-required substitutions inferred by the conventional parsimony method (Fitch 1971); the third column (m) presents the minimum-required substitutions inferred by Zhang-Nei (1997)'s method, and the forth column (k) presents the expected numbers of substitutions estimated by Gu and Zhang (1997) which are used for estimating alpha. |
|||
| Usefulness From the current program, we can obtain two results, the estimate of gamma shape parameter (alpha) for the rate variation among sites, and the expected number of substitutions of each amino acid (or nucleotide) site. These results are useful in molecular evolutionary analysis. (1) Distance estimation (2) Divergence time dating between genes and species (3) Phylogenetic reconstruction The estimate of alpha is useful to rule out the possibility that the phylogenetic tree inferred is not misleading by the negligence of rate variation among sites. An iteration is suggested as follows: first, estimate the alpha by the current program according to the tree reconstructed under the assumption of a uniform rate among sites. Second, re-compute the distance-matrix, considering the gamma distribution for the rate variation among sites, and infer the phylogenetic tree. (4) Profile of rate variability with sites The output file (alpha) can be used as the input for most commercially available software (e.g., EXCEL) so that the profile of rate variability with sites can be easily presented graphically by plotting k against the position of site. (5) Comparison of evolutionary rates between different regions (domains) |
|||
| References Gu, X. and J. Zhang (1997) A simple method for estimating the parameter of substitution rate variation among sites. Mol. Biol. Evol. 14:1106-1113 |
|||
|
|
|
Home
| CV |
Databases
| IMEG
Seminars |
Journals |
|
|
|
| Department of Biology |
Eberly College of Science | |