Protein Name Tagging Guidelines: Lessons Learned
Author Information
Author(s): Inderjeet Mani, Zhangzhi Hu, Seok Bae Jang, Ken Samuel, Matthew Krause, Jon Phillips, Cathy H. Wu
Primary Institution: Georgetown University
Hypothesis
The study aims to develop standardized guidelines for tagging protein names in biomedical literature to improve information extraction.
Conclusion
The revised tagging guidelines significantly improved inter-coder reliability in annotating protein names from MEDLINE abstracts.
Supporting Evidence
- Inter-coder consistency across three annotators on protein tags on 300 MEDLINE abstracts is 0.868 F-measure.
- The study developed a dictionary of 691,000 protein names to assist in tagging.
- Revised guidelines improved inter-coder reliability metrics significantly.
Takeaway
This study created rules to help people tag protein names in research papers better, making it easier to find and use this information.
Methodology
The study involved tagging protein names in 300 MEDLINE abstracts by multiple coders and assessing inter-coder reliability.
Potential Biases
Potential bias due to the subjective nature of tagging and the varying expertise of coders.
Limitations
The guidelines may not cover all naming conventions and the complexity of protein names can lead to inconsistencies.
Participant Demographics
Coders included biologists and researchers with varying levels of experience in the field.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website