PARETO-OPTIMAL ALIGNMENTS OF SYMBOL SEQUENCES

Roytberg M. A.
Institute of the Mathematical Problems in Biology, Russian Academy of Science, 142292, Pushchino, Moscow Region, Russia, E-mail: roytberg@impb.serpukhov.su
The common way to align sequences is to define an alignment scoring function and to search for an optimal alignment, i.e. the alignment having the maximal score. Usually the scoring function is a monotonous (linear) function of some natural "elementary" characteristics of an alignment, e.g. the number of matches, the number of mismatches, and the number of deleted/inserted symbols or fragments (gaps). However the coefficients of the scoring function often have no biological motivation and thus it is difficult to decide which of the alignments, that are optimal for different values of coefficients, indeed is the "correct" one.

To overcome this difficulty we propose to ascribe to each alignment not a scalar scoring function but a vector one. Let A be an alignment, H1, ..., Hn be its "elementary" characteristics, a1, ..., an be the positive coefficients and


be the scalar scoring function. Then we propose to consider n-component vector scoring function

.

DEFINITION. An alignment A dominates over an alignment B if each component of the vector V(A) is greater or equal than the corresponding component of the vector V(B) and at least one inequality is strict.

An alignment A is called Pareto-optimal, if it is not dominated by any other alignment.

The notion of Pareto-optimal alignment is dual to the concept of decomposition of the parametric space.

An algorithm constructing the set of all Pareto-optimal alignments for various vector scoring functions is presented. The time required to construct all Pareto-optimal alignments of two sequences W1 of length L1 and W2 of length L2 is proportional to , where M is the maximal possible number of Pareto-optimal alignments for the beginnings of sequences W1 and W2.