Protein Name Tagging Guidelines: Lessons Learned

Sample size: 300 publication Evidence: moderate

Author Information

Author(s): Inderjeet Mani, Zhangzhi Hu, Seok Bae Jang, Ken Samuel, Matthew Krause, Jon Phillips, Cathy H. Wu

Primary Institution: Georgetown University

The study aims to develop standardized guidelines for tagging protein names in biomedical literature to improve information extraction.

The revised tagging guidelines significantly improved inter-coder reliability in annotating protein names from MEDLINE abstracts.

Inter-coder consistency across three annotators on protein tags on 300 MEDLINE abstracts is 0.868 F-measure.
The study developed a dictionary of 691,000 protein names to assist in tagging.
Revised guidelines improved inter-coder reliability metrics significantly.

This study created rules to help people tag protein names in research papers better, making it easier to find and use this information.

The study involved tagging protein names in 300 MEDLINE abstracts by multiple coders and assessing inter-coder reliability.

Potential bias due to the subjective nature of tagging and the varying expertise of coders.

The guidelines may not cover all naming conventions and the complexity of protein names can lead to inconsistencies.

Coders included biologists and researchers with varying levels of experience in the field.

Access the complete publication on the publisher's website