ON THE DISTRIBUTION OF DINUCLEOTIDES IN NUCLEIC ACID SEQUENCES
Shepelev V. A.
Institute of Molecular Genetics, Russian Acad. Sci., 123182, Moscow, Kurchatov Sq., Russia; Fax: 1960221; E-mail: spl@img.ras.ru
The distribution of dinucleotides in nucleic acid sequences can be described
by a set of dinucleotide frequencies as well as relative frequencies (odds-ratios).
It is well known that, generally speaking, odds-ratio differs from unity.
Special research gave rise to the concept of genome signature, which implies
that species and taxons have peculiar values of odds-ratios. The so called
empirical distribution function provides a more detailed description for
the dinucleotide distribution. Having assumed that the letters are independent,
theoretical distributions have been deduced in an explicit form. Another
approach is based on the distribution of the waiting times for different
dinucleotides. Examples of distribution for large mammalian and human viruses
have been given for different alphabets. Special features of the distributions
for a wide variety of genomes have been also shown.