GeneMark developed in 1993 was the first gene finding method recognized as an efficient and accurate tool for genome projects. GeneMark was used for annotation of the first completely sequenced bacteria, Haemophilus influenzae, and the first completely sequenced archaea, Methanococcus jannaschii. The GeneMark algorithm uses species specific inhomogeneous Markov chain models of protein-coding DNA sequence as well as homogeneous Markov chain models of non- coding DNA. Parameters of the models are estimated from training sets of sequences of known type. The major step of the algorithm computes a posteriory probability of a sequence fragment to carry on a genetic code in one of six possible frames (including three frames in complementary DNA strand) or to be “non-coding”
GeneMark is documented as the most accurate prokaryotic gene finder.
GeneMark.hmm-P and GeneMark.hmm-E programs are predicting genes and intergenic regions in a sequence as a whole. They use the Hidden Markov models reflecting the “grammar” of gene organization.
The GeneMark.hmm (P and E) programs identify the maximum likely parse of the whole DNA sequence into protein coding genes (with possible introns) and intergenic regions.
To analyze ESTs and cDNAs you can use GeneMark-E.
Mark Borodovsky , Georgia Institute of TechnologyAtlanta, Georgia, USA
- Linux / Mac OsX
:: MORE INFORMATION
Borodovsky M. and McIninch J.
GeneMark: parallel gene recognition for both DNA strands,
Computers & Chemistry, 1993, Vol. 17, No. 19, pp. 123-133.
Besemer J., Lomsadze A. and Borodovsky M.,
GeneMarkS: a self-training method for predicition of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.
Nucleic Acids Research, 2001, Vol. 29, No. 12, 2607-2618