A SIMPLE AUTOMATIC TOOL FOR DRAFT ANNOTATION OF BACTERIAL GENOME SEQUENCE

Sorokin A.
Laboratoire Genetique Microbienne INRA CRJ, Domaine de Vilvert, Jouy-en-Josas 78, 352 cedex, France; E-mail sorokine@biotec.jouy.inra.fr
Exponential increase of bacterial genome sequence data and intensive functional studies imply for the ultimate need of simple and friendly-to-use automatic tools of whole-genome scale sequence annotation. I wrote a Perl language computer program which provides a mean for filtration of the data obtained by tFASTx comparison of bacterial genome sequence with a protein database. The program extracts, from the tFASTx output and the relevant database files, sequences, functional description and coordinates of the proteins encoded by the genome under annotation. Redundancy presented in the tFASTx format output files is automatically removed. Finally the program generates complete genome annotation in the form of clickable .html tables which can be conveniently browsed by using Netscape Navigator. The found genes are organised alphabetically, by their positions in the genome or by functional category. Iterative extension of the protein databases used for homology search, increases the precision and thoroughness of the analysis. The tool was tested in automatic annotation of complete B.subtilis genome and several other gram-positive bacteria genomes, which are under sequencing. The results show that this automatic annotation is about 90% of precision as compared to the manual analysis, in terms of number of predicted genes. The program includes also a keyword based classification feature which allows to organise found genes by biochemical or biological functional categories. The list of categories and relevant keywords can be modified by the program user. This possibility makes the program an indispensable tool for comparison of bacteria on the whole-genome scale.