THE SUBCLASS APPROACH FOR MUTATIONAL SPECTRA ANALYSIS: APPLICATION OF THE SEM ALGORITHM
Rogozin I. B., Glazko G. V., Milanesi L.
Institute of Cytology and Genetics, 630090, Novosibirsk, 10 Lavrentyev Ave., Russia; E-mail: rogozin@bionet.nsc.ru , fax:(3832)356568; 2 Istituto di Tecnologie Biomediche Avanzate CNR, via Fratelli Cervi 93, 20090 Segrate, Milano, Italy
Analysis and comparison of mutational spectra represents an important
problem in molecular biology. It is well known that spontaneous and induced
mutations to be largely confined to certain regions of nucleotide sequences.
Thus the mutability varies significantly along nucleotide sequences. For
analysis of a mutational spectrum we has applyed an algorithm based on
the SEM subclass approach (Simulation, Expectation, Maximization). The
algorithm tries to classify the mutational sites according to different
mutation probabilities, and each site should belong to one class. Each
class is approximated by Poisson or binomial distribution and thus any
real mutational spectrum is regarded as a mixture of Poisson or binomial
distributions. The separation process runs iteratively. Each iteration
includes the simulation, maximization and estimation procedure. To evaluate
the quality of the classification results, the X2 test is used. The algorithm
has been checked on random spectra with preset parameters and on real mutational
spectra from the Database of Mutational Spectra (Rogozin et al., 1992).
As has been shown, 17 out of 19 analyzed real mutational spectra can be
divided in two or more classes, of which one contains hotspots of mutation.
For the G:C->A:T mutational spectra induced by Sn1 alkylating mutagens
(11 spectra) the classification accuracy was 0.95. From the analysis of
the errors of classification it is possible to suggest that at least part
of them are caused by some special features of mutagenesis itself. Good
correlation with real data was shown for other spectra of induced and spontaneous
mutations. The program implementing the SEM algorithm is available on the
Web server (
http://www.itba.mi.cnr.it/webmutation).
The Poisson distribution is widely using for a mutational spectrum approximation,
although binomial distribution also could be used for this purpose. We
will discuss the possibility of applying of standard distributions for
mutational spectra classification based on analysis of molecular-genetical
systems for mutations detection and on analysis of real mutational spectra.