Identifying Named Entities in Biomedical Text
Author Information
Author(s): Shipra Dingare, Malvina Nissim, Jenny Finkel, Christopher Manning, Claire Grover
Primary Institution: University of Edinburgh
Hypothesis
Can a maximum entropy-based system effectively identify named entities in biomedical abstracts?
Conclusion
The system achieved an F-score of 83.2% in the BioCreative evaluation and 70.1% in the BioNLP evaluation, indicating its effectiveness in named entity recognition.
Supporting Evidence
- The system achieved an exact match F-score of 83.2% in the BioCreative evaluation.
- It performed with an F-score of 70.1% in the BioNLP evaluation.
- The study highlighted the importance of data quality in achieving better performance.
- Annotation inconsistencies were identified as a major source of errors in the evaluations.
Takeaway
This study created a computer program that helps find names of genes and proteins in medical research papers, and it did a pretty good job at it.
Methodology
The study used a maximum entropy Markov model to classify words in biomedical abstracts, incorporating local features and external resources.
Potential Biases
Potential bias due to the quality of training and evaluation data, which may not be representative.
Limitations
The performance was affected by inconsistent data annotation and varying task difficulty between evaluations.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website