BioCreAtIvE Task1A: entity identification with a stochastic tagger
2005

Entity Identification in Molecular Biology Using a Stochastic Tagger

Sample size: 5000 publication 10 minutes Evidence: moderate

Author Information

Author(s): Kinoshita Shuhei, Cohen K Bretonnel, Ogren Philip V, Hunter Lawrence

Primary Institution: Center for Computational Pharmacology, University of Colorado School of Medicine

Hypothesis

Can a part-of-speech tagger be effectively used for entity identification in molecular biology?

Conclusion

A part-of-speech tagger can be enhanced with post-processing rules to create a competitive entity identification system.

Supporting Evidence

  • The base system achieved a precision of 68.0% and recall of 77.2%.
  • With post-processing, precision improved to 80.3% and recall to 80.5%.
  • The F-measure increased from 72.3% to 80.4% with post-processing.

Takeaway

The researchers used a special tagging system to find gene names in scientific texts, and they made it better by adding extra rules to fix mistakes.

Methodology

The study used a stochastic part-of-speech tagger with post-processing rules to identify gene mentions in biomedical literature.

Potential Biases

The system's performance may be influenced by the specific training data used.

Limitations

The study did not rigorously compare the performance of different taggers.

Statistical Information

P-Value

p<0.05

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.1186/1471-2105-6-S4

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication