THE SUBCLASS APPROACH FOR MUTATIONAL SPECTRA ANALYSIS: APPLICATION OF THE SEM ALGORITHM

Rogozin I. B., Glazko G. V., Milanesi L.
Institute of Cytology and Genetics, 630090, Novosibirsk, 10 Lavrentyev Ave., Russia; E-mail: rogozin@bionet.nsc.ru , fax:(3832)356568; 2 Istituto di Tecnologie Biomediche Avanzate CNR, via Fratelli Cervi 93, 20090 Segrate, Milano, Italy
Analysis and comparison of mutational spectra represents an important problem in molecular biology. It is well known that spontaneous and induced mutations to be largely confined to certain regions of nucleotide sequences. Thus the mutability varies significantly along nucleotide sequences. For analysis of a mutational spectrum we has applyed an algorithm based on the SEM subclass approach (Simulation, Expectation, Maximization). The algorithm tries to classify the mutational sites according to different mutation probabilities, and each site should belong to one class. Each class is approximated by Poisson or binomial distribution and thus any real mutational spectrum is regarded as a mixture of Poisson or binomial distributions. The separation process runs iteratively. Each iteration includes the simulation, maximization and estimation procedure. To evaluate the quality of the classification results, the X2 test is used. The algorithm has been checked on random spectra with preset parameters and on real mutational spectra from the Database of Mutational Spectra (Rogozin et al., 1992). As has been shown, 17 out of 19 analyzed real mutational spectra can be divided in two or more classes, of which one contains hotspots of mutation. For the G:C->A:T mutational spectra induced by Sn1 alkylating mutagens (11 spectra) the classification accuracy was 0.95. From the analysis of the errors of classification it is possible to suggest that at least part of them are caused by some special features of mutagenesis itself. Good correlation with real data was shown for other spectra of induced and spontaneous mutations. The program implementing the SEM algorithm is available on the Web server (http://www.itba.mi.cnr.it/webmutation).

The Poisson distribution is widely using for a mutational spectrum approximation, although binomial distribution also could be used for this purpose. We will discuss the possibility of applying of standard distributions for mutational spectra classification based on analysis of molecular-genetical systems for mutations detection and on analysis of real mutational spectra.