ON THE DISTRIBUTION OF DINUCLEOTIDES IN NUCLEIC ACID SEQUENCES

Shepelev V. A.
Institute of Molecular Genetics, Russian Acad. Sci., 123182, Moscow, Kurchatov Sq., Russia; Fax: 1960221; E-mail: spl@img.ras.ru
The distribution of dinucleotides in nucleic acid sequences can be described by a set of dinucleotide frequencies as well as relative frequencies (odds-ratios). It is well known that, generally speaking, odds-ratio differs from unity. Special research gave rise to the concept of genome signature, which implies that species and taxons have peculiar values of odds-ratios. The so called empirical distribution function provides a more detailed description for the dinucleotide distribution. Having assumed that the letters are independent, theoretical distributions have been deduced in an explicit form. Another approach is based on the distribution of the waiting times for different dinucleotides. Examples of distribution for large mammalian and human viruses have been given for different alphabets. Special features of the distributions for a wide variety of genomes have been also shown.