NG-NEW: A Modified Nei-Gojobori Method for Computing Synonymous and Nonsynonymous Distances (c) Copyright March 1998 by Jianzhi Zhang and the Pennsylvania State University. Permission is granted to copy this document provided that no fee is charged for it and that this copyright notice is not removed. NG-NEW is distributed free of charge by Jianzhi Zhang Institute of Molecular Evolutionary Genetics and Department of Biology 322 Mueller Laboratory The Pennsylvania State University University Park, PA 16802, USA Telephone: 814-8657030 Fax: 814-8637336 Email: zhang@imeg.bio.psu.edu Suggested citation: Zhang J, Rosenberg HF, Nei M (1998) Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc. Natl. Acad. Sci. USA 95:3708-3713. 1. Introduction NG-NEW is designed for estimating synonymous and nonsynonymous distances between protein coding DNA sequences. The method is modified from the original Nei and Gojobori (1986) method to take into account the transition bias. The program is written in C language and can be used on IBM PC compatible computers with the windows95 operating system. 2. Installation First make sure that the diskette you have received contains the following files. ng-new.c (source code) ng-new2.c (source code) ng-new.exe (executable file) ng-new2.exe (executable file) manual (this file) rnase.seq (an example data file) outfile (output file) To install NG-NEW on your computer's hard disk drive ("C" drive given here, for example), you should create a directory where the files of this package will be present. To do this, type the following c:\md ng-new (Enter) To copy the NG-NEW files onto your hard disk drive, insert the floppy disk containing the programs into your floppy drive ("A" drive given here, for example). Then, enter the following command c:\copy a:*.* c:\ng-new\*.* (Enter) The difference between the programs ng-new.exe and ng-new2.exe is that the latter does not compute the covariance matrix, which makes large-data (>100 sequences) analysis feasible. 3. Input file To use the program, you need an input file containing the protein coding DNA sequences (see rnase.seq for an example). This file begins with two numbers: the number of sequences and the number of nucleotides per sequence (sequence length). The second line will be the name of the first sequence, and the third line will be the first sequence, and so on. Only A, G, C, T, a, g, c, and t are allowed. Gaps should be removed and sequences should be aligned beforehand. The sequences should only include protein-coding regions, with stop codons removed 4. Notations. R: transition/transversion ratio. R=0.5 means no transition bias. Note that R is not the transition/transversion rate ratio (which is often denoted by kapa). Under Kimura's model, 2R=kapa. S: number of synonymous sites of a sequence. N: number of nonsynonymous sites of a sequence. s: number of synonymous differences between two sequences. n: number of nonsynonymous differences between two sequences. ps: p-distance (proportion) of synonymous difference. pn: p-distance (proportion) of nonsynonymous difference. ds: Jukes-Cantor distance of synonymous difference. dn: Jukes-Cantor distance of nonsynonymous difference. 5. Computation. To compute S, N, s, n, ps, pn, ds, and dn, type c:\ng-new\ng-new filename For example, to try the rnase.seq data, type c:\ng-new\ng-new rnase.seq You will be asked to input the transition/transversion ratio (R), which should be estimated beforehand. If you want to use the original Nei-Gojobori method, input R=0.5. The variances and covariances of distances are computed according to Ota and Nei (1994). 5. Output file There are several output files with different formats. outfile: this is most useful, including S, N, s, n, ps, pn, ds, dn, and variances. sn.rst: this file includes covariances, in addition to those quantities given in outfile. s.dis: this file is used as an input file for bn-bs.exe. n.dis: this file is used as an input file for bn-bs.exe. The files sn.rst, s.dis, and n.dis are generated only when ng-new.exe is used.