A system for identifying named entities in biomedical text: how results from two evaluations reflect on both the system and the evaluations
2005

Identifying Named Entities in Biomedical Text

publication Evidence: moderate

Author Information

Author(s): Shipra Dingare, Malvina Nissim, Jenny Finkel, Christopher Manning, Claire Grover

Primary Institution: University of Edinburgh

Hypothesis

Can a maximum entropy-based system effectively identify named entities in biomedical abstracts?

Conclusion

The system achieved an F-score of 83.2% in the BioCreative evaluation and 70.1% in the BioNLP evaluation, indicating its effectiveness in named entity recognition.

Supporting Evidence

  • The system achieved an exact match F-score of 83.2% in the BioCreative evaluation.
  • It performed with an F-score of 70.1% in the BioNLP evaluation.
  • The study highlighted the importance of data quality in achieving better performance.
  • Annotation inconsistencies were identified as a major source of errors in the evaluations.

Takeaway

This study created a computer program that helps find names of genes and proteins in medical research papers, and it did a pretty good job at it.

Methodology

The study used a maximum entropy Markov model to classify words in biomedical abstracts, incorporating local features and external resources.

Potential Biases

Potential bias due to the quality of training and evaluation data, which may not be representative.

Limitations

The performance was affected by inconsistent data annotation and varying task difficulty between evaluations.

Digital Object Identifier (DOI)

10.1002/cfg.457

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication