ADAPTSITE: Programs for detecting natural selection at single amino acid sites (c) Copyright August 8 2000 by Yoshiyuki Suzuki and the Pennsylvania State University. Permission is granted to copy this document provided that no fee is charged for it and that this copyright notice is not removed. Introduction: adaptsite is designed for detecting positive and negative selection at single amino acid sites. adaptsite is composed of four programs, adaptsite-p, adaptsite-d, adaptsite-t, and adaptsite-l. adaptsite-p computes average numbers of synonymous (ss), nonsynonymous (sn), conservative (sc), and radical (sr) sites as well as the total numbers of synonymous (cs), nonsynonymous (cn), conservative (cc), and radical (cr) substitutions throughout the phylogenetic tree for each codon site. Here conservative and radical refer to whether the substitution changes the property of amino acids in terms of charge. adaptsite-d computes average numbers of synonymous (ss) and nonsynonymous (sn) sites as well as the total numbers of synonymous (cs) and nonsynonymous (cn) substitutions throughout the phylogenetic tree for each codon site. adaptsite-p and adaptsite-d are different in the method of inferring ancestral nucleotide sequences at interior nodes in the phylogenetic tree. adaptsite-p uses maximum parsimony method and adaptsite-d uses two-step distance-based Bayesian method for inferring ancestral nucleotide sequences. adaptsite-t computes the probability (p-value) of obtaining the observed or more biased values for cs, cn, cc, and cr under the assumption of selective neutrality for each codon site using the output of adaptsite-p and adaptsite-d. adaptsite-l estimates the ratio of the rate of synonymous (rs) to nonsynonymous (rn) substitution using the maximum likelihood method. The detailed algorithm can be seen in the suggested references listed below. The programs are written in C language and can be executable on UNIX and LINUX (on PC) operating systems. If you find bugs, please e-mail to the author (Yoshiyuki Suzuki, yis1+AEA-psu.edu). Files: README : this file Makefile-p : makefile for adaptsite-p Makefile-p.2 : makefile for adaptsite-p Makefile-d : makefile for adaptsite-d Makefile-t : makefile for adaptsite-t Makefile-l : makefile for adaptsite-l adaptsite-p : executable file for adaptsite-p adaptsite-p.h : header file for adaptsite-p adaptsite-p.1.c : source file for adaptsite-p adaptsite-p.2.c : source file for adaptsite-p adaptsite-p.2-1.c : source file for adaptsite-p adaptsite-p.2-2.c : source file for adaptsite-p adaptsite-p.3.c : source file for adaptsite-p adaptsite-p.3-1.c : source file for adaptsite-p adaptsite-p.3-2.c : source file for adaptsite-p adaptsite-p.4.c : source file for adaptsite-p adaptsite-p.4-1.c : source file for adaptsite-p adaptsite-p.4-2.c : source file for adaptsite-p adaptsite-p.5.c : source file for adaptsite-p adaptsite-p.6.c : source file for adaptsite-p adaptsite-d : executable file for adaptsite-d adaptsite-d.h : header file for adaptsite-d adaptsite-d.1.c : source file for adaptsite-d adaptsite-d.2.c : source file for adaptsite-d adaptsite-t : executable file for adaptsite-t adaptsite-t.c : source file for adaptsite-t adaptsite-l : executable file for adaptsite-l adaptsite-l.c : source file for adaptsite-l smpl.aln.1 : sample alignment file (for adaptsite-p and adaptsite-d) smpl.aln.2 : sample alignment file (for njtree) result from +ACI-align smpl.aln+ACI- smpl.tre.1 : sample tree file (result from +ACI-njtree smpl.aln.2 -d20 -b0+ACI-) smpl.tre.2 : sample tree file (result from +ACI-formtre smpl.tre.1+ACI-) smpl.res.p : result of +ACI-adaptsite-p smpl.tre.2 smpl.aln.1 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0+ACI- smpl.res.d : result of +ACI-adaptsite-d smpl.tre.2 smpl.aln.1 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 poisson+ACI- smpl.res.p.t : result of +ACI-adaptsite-t smpl.p+ACI- smpl.res.d.t : result of +ACI-adaptsite-t smpl.d+ACI- smpl.res.l : result of +ACI-adaptsite-l smpl.tre.2 smpl.aln.1 1.0 0.00001+ACI- formaln.c : source file for formaln formaln : executable file for changing alignment format from the CLUSTAL W format to the njtree readable format formtre.c : source file for formtre formtre : executable file for changing tree format from the njtree format to the adaptsite readable format anc-gene.p : executable file for anc-gene.p (modified from anc-gene written by Dr. Jianzhi Zhang) anc-gene.p.c : source file for anc-gene.p (modified from anc-gene.c written by Dr. Jianzhi Zhang) anc-gene.j : executable file for anc-gene.j (modified from anc-gene written by Dr. Jianzhi Zhang) anc-gene.j.c : source file for anc-gene.j (modified from anc-gene.c written by Dr. Jianzhi Zhang) poisson.pro : Poisson model for substitution matrix of amino acids (written by Dr. Jianzhi Zhang) jtt.pro : JTT model for substitution matrix of amino acids (written by Dr. Jianzhi Zhang) njtree : executable file for reconstructing phylogenetic trees by using the neighbor-joining method (written by Dr. Naoko Takezaki as njboot available in lintree) How to install: To install adaptsite-p, type +ACU-cp Makefile-p Makefile +ACU-make If you failed to install adaptsite-p with the above procedure, please try +ACU-cp Makefile-p.2 Makefile +ACU-make To install adaptsite-d, type +ACU-cp Makefile-d Makefile +ACU-make +ACU-cc anc-gene.p.c -lm -o anc-gene.p +ACU-cc anc-gene.j.c -lm -o anc-gene.j To install adaptsite-t, type +ACU-cp Makefile-t Makefile +ACU-make To install adaptsite-l, type +ACU-cp Makefile-l Makefile +ACU-make How to use: ------ adaptsite-p ------ To estimate ss, sn, sc, sr, cs, cn, cc, and cr throughout the phylogenetic tree for each codon site, you need a tree file and an alignment file. ------ adaptsite-d ------ To estimate ss, sn, cs, and cn throughout the phylogenetic tree for each codon site, you need a tree file and an alignment file. ------ Making an alignment file ------ The alignment file can be made by using the computer program CLUSTAL W. Please remove first three lines of the output file from CLUSTAL W. The alignment has to start with the first position of the codon, and all introns or non-coding regions should be eliminated from the alignment. adaptsite-p can handle gap sites to some extent, but I recommend you to exclude all gaps from the alignment by excluding all gap codon sites, or excluding all sequences which include gaps. adaptsite-d cannot handle gap sites at all. All gap sites should be excluded as indicated above. ------ Making a tree file ------ The tree file can be made by using the computer program njtree. This program does not read the CLUSTAL W alignment. Please remove the first three lines in the output file from CLUSTAL W, and type +ACU-formaln filename where filename is the modified CLUSTAL W output file. You will obtain an alignment file which is now readable by njtree. The program njtree is the same as the program njboot which is available in the program lintree. You can get this program package from the World-Wide Web site http://mep.bio.psu.edu. Please read the manual of lintree to use njtree. To reconstruct a phylogenetic tree, please type +ACU-njtree filename +AFs-options+AF0- where filename is the njtree readable alignment file. In the current version, adaptsites do not read the output tree file from njtree. So please convert the njtree file to adaptsite readable file. Please type +ACU-formtre filename where filename is the output file from njtree. ------ Estimation of ss, sn, sc, sr, cs, cn, cc, and cr ------ Now, you can estimate ss, sn, sc, sr, cs, cn, cc, anc cr. If you want to estimate them by inferring the ancestral nucleotide sequences by maximum parsimony method, type +ACU-adaptsite-p treefile alignmentfile mu+AFs-12+AF0- mu+AFs-12+AF0- indicates the relative mutation rates among nucleotides, which are used for estimating the numbers of synonymous and nonsynonymous sites for codons. The first to the twelfth numbers indicate the relative mutation rate of T-+AD4-C, T-+AD4-A, T-+AD4-G, C-+AD4-T, C-+AD4-A, C-+AD4-G, A-+AD4-T, A-+AD4-C, A-+AD4-G, G-+AD4-T, G-+AD4-C, and G-+AD4-A, respectively. For example, if you type +ACU-adaptsite-p treefile alignmentfile 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 the mutation matrix is the one parameter model To From T C A G T - a a a C a - a a A a a - a G a a a - and if you type +ACU-adaptsite-p treefile alignmentfile 5.0 1.0 1.0 5.0 1.0 1.0 1.0 1.0 5.0 1.0 1.0 5.0 the mutation matrix is the two parameter model with the transition/transversion rate ratio of 5.0 To From T C A G T - 5a a a C 5a - a a A a a - 5a G a a 5a - and so on. You can assume any types of models. Similarly, if you want to estimate ss, sn, cs, and cn by inferring the ancestral nucleotide sequences by distance-based Bayesian method, type +ACU-adaptsite-b treefile alignmentfile mu+AFs-12+AF0- su Here, su indicates the substitution matrix of amino acids you want to use for inferring ancestral amino acid sequences. adaptsite-d calls anc-gene.p or anc-gene.j to estimate the ancestral nucleotide sequences. These programs estimate the ancestral nucleotide sequences through estimating the ancestral amino acid sequences. You can specify Poisson model or JTT model for estimating amino acid sequences. If you want to use Poisson model, type +ACU-adaptsite-d treefile alignmentfile mu+AFs-12+AF0- poisson and if you want to use JTT model, type +ACU-adaptsite-d treefile alignmentfile mu+AFs-12+AF0- jtt The output from adaptsite-p contains 16 columns. They are+ADs- the codon position, the number of equally parsimonious set for ancestral codons, ss, cs, cs/ss, sn, cn, cn/sn, (cn/sn)/(cs/ss), sc, cc, cc/sc, sr, cr, cr/sr, and (cr/sr)/(cr/sr). from the first to the 16th column, respectively. From these numbers, you can test the neutrality for each codon site by using adaptsite-t. adaptsite-t computes the probability (p-value) of obtaining the observed or more biased numbers for cs, cn, cs, and cr under neutral evolution for each codon site. To use adaptsite-t, type +ACU-adaptsite-t output where output indicates the output file from adaptsite-p or adaptsite-d. The output from adaptsite-t contains 21 columns. The first column indicates the codon position. The second and the third columns indicate the p-value for one-tailed test. The fourth and sixth columns indicate the p-value and 1-p, respectively, for two-tailed test. When cn/sn is larger than cs/ss, these values are indicated as positive values, and when cs/ss is larger than cn/sn, these values are indicated as negative values. The fifth column indicates that the codon site is positively selected (A), negatively selected (P), or neither (N) by using the two- tailed test. The significance level is set at 5+ACU-. Note that +ACI-N+ACI- does not mean that the codon site is evolving neutrally. The 7th to 11th columns, 12th to 16th columns, and 17th to 21st columns indicate the results for conservative, radical, and conservative/radical substitution. The output from and adaptsite-d contains nine columns. They are+ADs- the codon position, the number of equally parsimonious set for ancestral codons, ss, cs, cs/ss, sn, cn, cn/sn, and (cn/sn)/(cs/ss), from the first to the nineth columns, respectively. From these numbers, you can test the neutrality for each codon site by using adaptsite-t. adaptsite-t computes the probability (p-value) of obtaining the observed or more biased numbers for cs and cn under neutral evolution for each codon site. To use adaptsite-t, type +ACU-adaptsite-t output where output indicates the output file from adaptsite-p or adaptsite-b. The output from adaptsite-t contains six columns. The first column indicates the codon position. The second and the third columns indicate the p-value for one-tailed test. The fourth and sixth columns indicate the p-value and 1-p, respectively, for two-tailed test. When cn/sn is larger than cs/ss, these values are indicated as positive values, and when cs/ss is larger than cn/sn, these values are indicated as negative values. The fifth column indicates that the codon site is positively selected (A), negatively selected (P), or neither (N) by using the two- tailed test. The significance level is set at 5+ACU-. Note that +ACI-N+ACI- does not mean that the codon site is evolving neutrally. ------ adaptsite-l ------ adaptsite-l also requires an alignment file and a tree file. Type +ACU-adaptsite-l treefile alignmentfile +AFs-eqcodfre+AFs-61+AF0AXQ- ts/tv rn/rs eqcodfre is optional and you can assign equilibrium frequencies of 61 sense codons. The order is TTT, TTC, TTA, TTG, ..., CTT, ..., ATT, ..., GTT, ..., TCT, ..., TAT, ..., TGT, ..., GGG. ts/tv is the ratio of the rate of transitional to transversional substitutions. rn/rs is the ratio of the rate of nonsynonymous to synonymous substitutions. In the output file, the first column indicates the site number, the second column is the maximum likelihood estimate of rn/rs, and the character in the last column indicates the type of selection. N and P indicates negative and positive selections, respectively. Suggested references: Suzuki Y. and Gojobori T. (1999) A method for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 16:1315-1328. Suzuki Y. (1999) Molecular evolution of pathogenic viruses. Ph.D. dissertation. The Graduate University for Advanced Studies. Hayama, Japan. Suzuki Y., Gojobori T., and Nei M. (2001) ADAPTSITE: detecting natural selection at single amino acid sites. Bioinformatics 17:660-661. adaptsite is distributed free of charge by: Yoshiyuki Suzuki, M.D. Ph.D. Department of Biology Institute of Molecular Evolutionary Genetics The Pennsylvania State University 311 Mueller Laboratory University Park, PA 16802, USA Tel: 814-865-1034 Fax: 814-863-7336 E-mail: yis1+AEA-psu.edu ADAPTSITE.1.4 Last modified on June 2, 2007.