The algorithm is shown to find the optimal consensus for sequences simulated with high mismatch and indel probabilities. On simulated data it appeared to be robust to the error rate parameters used in the algorithm. When the sequences were noisy (the error rate of about 0.1) consensus sampler was still able to restore the true consensus, while an HMM profile model trained with expectation maximization usually failed to do this. This is because if indels are a priori likely, a profile HMM model will get stuck in a local mode in which artificial, highly conserved match states are created by indels, and if indels are a priori unlikely, then an profile HMM model tends to misalign the sequences,and thus produce artifically dispersed distributions in match states, leading again to a local mode.
By allowing wild cards in the consensus, in particular 'N',the algorithm is capable of detecting conserved regions of the alignment. When tested on E.Coli promoters the algorithm found consensus for -35 and -10 regions and aligned sequences to those regions. For yeast introns the alignment produced conserved regions at branch point, 5' and 3' sites. Since indels do not play any role in these functional sites, the HMM alignment also succeeded in finding those conserved patterns.
We describe the method and illustrate it with examples.