|
|
|
Home
| CV |
Databases
| IMEG
Seminars | Journals |
|
|
|
HON-NEW:
A Method for Computing Conservative and Radical Nonsynonymous Distances |
| Jianzhi Zhang Laboratory of Host Defenses National Institute of Allergy and Infectious Diseases National Institutes of Health Building 10, room 11N104 9000 Rockville Pike Bethesda, MD 20892 Tel.: 301-402-1668 Fax: 301-402-4369 E-mail: jzhang@niaid.nih.gov |
|||||||||||||||||
| Suggested
Citation Zhang J. (2000) Rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear genes. J. Mol. Evol. 50:56-68 |
|||||||||||||||||
| Introduction HON-NEW is designed for estimating conservative and radical nonsynonymous distances between protein coding DNA sequences. The method is modified from the original method of Hughes, Ota, and Nei (1990)by taking into account the transition bias. Three types of amino acid classifications (charge, polarity and that of Miyata and Yasunaga) are provided. One can also define conservative and radical amino acid changes by oneself (see next paragraph). The program is written in C language and can be used on IBM PC compatible computers with the windows95 operating system. One can define amino acid groups so that changes among groups are radical and within groups are conservative. To do that, creat a file named self.div. In this file, the first line should be the groups of amino acids (e.g., in the case of charge, there are three groups [-,0,+]), the second line is the number of amino acids in the first group, a space, and the amino acids in the group.One-letter code of amino acids should be used. Next line will be the information for the second group. One only needs to input the information of the first n-1 groups, if there are n groups in total, because the last group can be derived from the information of the first n-1 groups. See charge.div for an example. |
|||||||||||||||||
| Installation First make sure that the diskette you have received contains the following files.
|
|||||||||||||||||
| Input file
To use the program, you need an input file containing the protein coding DNA sequences with stop codons removed (see rnase.seq for an example). This file begins with two numbers: the number of sequences and the number of nucleotides per sequence (sequence length). The second line will be the name of the first sequence, and the third line will be the first sequence, and so on. Only A, G, C, T, a, g, c, and t are allowed. Gaps should be removed and sequences should be aligned beforehand. |
|||||||||||||||||
| Computation
To compute C, R, c, r, etc., type c:\hon-new\hon-new filename For example, to try the rnase.seq data, type c:\hon-new\hon-new rnase.seq You will be asked to input the transition/transversion ratio (Ts/Tv), which should be estimated beforehand. If you want to use the original method of Hughes, Ota, and Nei (1990), input Ts/Tv=0.5. The variances and covariances of distances are computed according to Ota and Nei (1994). |
|||||||||||||||||
| Output file
There are several output files with different formats. (1) outfile: this is most useful, including C, R, c, r, pc, pr, and their variances. (2) cr.rst: this file includes covariances, in addition to those quantities given in outfile. (3) sn.rst: includes S, N, s, n, ps, pn, ds, dn, and their variances and covariances. |
|||||||||||||||||
|
References&Notations Ts/Tv: transition/transversion ratio. Ts/Tv=0.5 means no transition bias. Note that R is not the transition/transversion rate ratio (which is often denoted by kapa). Under Kimura's model, 2R=kapa. S: number of synonymous sites of a sequence. N: number of nonsynonymous sites of a sequence. s: number of synonymous differences between two sequences. n: number of nonsynonymous differences between two sequences. ps: p-distance (proportion) of synonymous difference. pn: p-distance (proportion) of nonsynonymous difference. ds: Jukes-Cantor distance of synonymous difference. dn: Jukes-Cantor distance of nonsynonymous difference. C: number of conservative nonsynonymous sites of a sequence. R: number of radical nonsynonymous sites of a sequence. c: number of conservative nonsynonymous differences between two sequences. r: number of radical nonsynonymous differences between two sequences. pc: p-distance (proportion) of conservative nonsynonymous difference. pr: p-distance (proportion) of radical nonsynonymous difference. |
|||||||||||||||||
|
|
|
Home
| CV |
Databases
| IMEG
Seminars | Journals |
|
|
|
| Department of Biology |
Eberly College of Science | |