Entity Identification in Molecular Biology Using a Stochastic Tagger
Author Information
Author(s): Kinoshita Shuhei, Cohen K Bretonnel, Ogren Philip V, Hunter Lawrence
Primary Institution: Center for Computational Pharmacology, University of Colorado School of Medicine
Hypothesis
Can a part-of-speech tagger be effectively used for entity identification in molecular biology?
Conclusion
A part-of-speech tagger can be enhanced with post-processing rules to create a competitive entity identification system.
Supporting Evidence
- The base system achieved a precision of 68.0% and recall of 77.2%.
- With post-processing, precision improved to 80.3% and recall to 80.5%.
- The F-measure increased from 72.3% to 80.4% with post-processing.
Takeaway
The researchers used a special tagging system to find gene names in scientific texts, and they made it better by adding extra rules to fix mistakes.
Methodology
The study used a stochastic part-of-speech tagger with post-processing rules to identify gene mentions in biomedical literature.
Potential Biases
The system's performance may be influenced by the specific training data used.
Limitations
The study did not rigorously compare the performance of different taggers.
Statistical Information
P-Value
p<0.05
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website