NLProt is a tool for finding protein-names in natural language-text. It is based on Support Vector Machines (SVMs), which are trained on contextual-features of named entities in scientific language. Additionally, simple filtering rules and a protein-name dictionary are used to increase performance. NLProt reached a precicion (accuracy) of 70% at a recall (coverage) of 85% after running it on the 166 most recent abstracts of EMBL and Cell
- Linux / Mac OsX
:: MORE INFORMATION
NLProt: extracting protein names and sequences from papers.
Mika S, Rost B.
Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W634-7.