EasyGene estimates the statistical significance of a predicted gene. The gene finder is based on a hidden Markov model (HMM) that is automatically estimated for a new genome. Using extensions of similarities in Swiss-Prot, a high quality training set of genes is automatically extracted from the genome and used to estimate the HMM. Putative genes are then scored with the HMM, and based on score and length of an ORF, the statistical significance is calculated. The measure of statistical significance for an ORF is the expected number of ORFs in one megabase of random sequence at the same significance level or better, where the random sequence has the same statistics as the genome in the sense of a third order Markov chain.
:: MORE INFORMATION
Large-scale prokaryotic gene prediction and comparison to genome annotation.
P. Nielsen and A. Krogh.
Bioinformatics: 21:4322-4329, 2005.
EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance.
Thomas Schou Larsen and Anders Krogh.
BMC Bioinformatics: 4:21, 2003