Ponomarenko J. V., Furman D. P., Mischenko T. M., Katokhina L. V., Valuev V. V., Peregoedova E. L., Frolov A. S., Podkolodny N. L., Ponomarenko M. P., Kolchanov N. A.
Institute of Cytology & Genetics, 630090, Novosibirsk, Russia; FAX: +7(3832)356-558; E-mail: jpon@bionet.nsc.ru ;; Institute of Computational Mathematics & Mathematical Geophysics, Novosibirsk, Russia;
Recent evaluations of the genome annotation algorithms have shown the
necessity to increase the recognition accuracy of the functional DNA/RNA
sites (Burset, 1996; Fickett, 1997). Thus, it is timely to search for additional
sources of experimental data applicable to recognition of the functional
sites from their sequences. We suggest to compile the activity values of
the sites and physico-chemical and conformational properties of DNA/RNA.
Employing the earlier described linear-additive approximation (Kolchanov,
1998), these data allow to predict the activity of the functional sites
from their sequences. First, we have described in a data base over 240
experiments on promoters, protein-binding sites, mRNA leaders, pre-mRNA
processing sites, and many other DNA and RNA sites with the activities
characterized quantitatively in terms of kinetic and equilibrium constants,
lifetime and helical bend of DNA/protein complexes, cutting efficiencies,
the reporter gene expression, transcription or translation levels, etc.
In the second data base, we have compiled over 30 complete sets of dinucleotide
values of propeller, twist, tip, tilt, bend, wedge, direction, inclination,
rise, depth, width, dist, size, persistent length, entropy, enthalpy, free
energy, melting temperature, and other DNA and RNA properties. Then, we
have cited all the huge body of experimental data via an reference base
on article titles, authors, journals, abstracts, figure and table numbers.
Currently, this base comprises over 60 articles. Next, the linear-additive
approximation (Kolchanov, 1998) was applied to determine those mean values
of the DNA/RNA properties in the neighborhood of the functional sites that
can be used to predict activities of the sites from their sequences. These
programs have been stored in the C-code data base. Finally, the above data
bases on (1) the functional DNA/RNA site activities, (2) the conformational
and physico-chemical DNA\RNA properties, (3) the relevant references, and
(4) the C-code programs predicting site activities from their sequences
have been integrated using SRS query language (Etzold, 1993). That resulted
the distributed knowledge base ACTIVITY for the functional DNA/RNA site
activities,
http://wwwmgs.bionet.nsc.ru/systems/Activity/.
This work was supported by Russian Human Genome, Russian Foundation
for Basic Research, SB RAS Young Scientists Grants.