INVESTIGATION OF NON-COMPLETE DIRECT DNA REPEATS BY THE USE OF REDUNDANCY PROFILES

Borovina T., Nazipova N., Kislyuk O.
Institut of Mathematical Problems of Biology, Russian Academy of Sciences, 142292, Pushchino, Moscow Region, Russia; E-mail: nnn@impb.serpukhov.su
Epstein-Barr Virus genome is shown to contain 3 previosly unknown regions of non-complete (non-perfect) direct repeats by the use of two-step analysis.

1. Computation of redundancy profile for entire genome. Complete genome has to be subdivided into adjacent fragments (frames) of equal lengths. Value of redundancy (in terms of informational theory) for each fragment characterizes a degree of discrepancy between the fragment and a random-organized text of the same length and a/t/g/c-content. Then we can describe complete genom by the set of redundancy values in adjacent fragments. We call this set of values the redundancy profile for the genome.If a/t/g/c-content (and also for di- and three-nucleotide content) in a fragment is similar to average for entire genome the high value of redundancy for the fragment may be explained by existing of some perfect or non-perfect direct repeats of different lengths.

2. Some fragments of significally higher redundancy level (in comparison to average for entire set of fragments) were examined by the repeats detection program MOTIF.

We used three approaches for DNA redundancy calculation: redundancy in terms of Shannon entropy, Lempel-Ziv complexity and our own method of redundancy estimating as the high-frequency component of l-gramm graph.

The work is supported by grant N 58 of Russian State Scientific Program BioDiversity and by grant Russian Foundation for Basic Research N 96-07-89177.