Protein name tagging guidelines: lessons learned
2005

Protein Name Tagging Guidelines: Lessons Learned

Sample size: 300 publication Evidence: moderate

Author Information

Author(s): Inderjeet Mani, Zhangzhi Hu, Seok Bae Jang, Ken Samuel, Matthew Krause, Jon Phillips, Cathy H. Wu

Primary Institution: Georgetown University

Hypothesis

The study aims to develop standardized guidelines for tagging protein names in biomedical literature to improve information extraction.

Conclusion

The revised tagging guidelines significantly improved inter-coder reliability in annotating protein names from MEDLINE abstracts.

Supporting Evidence

  • Inter-coder consistency across three annotators on protein tags on 300 MEDLINE abstracts is 0.868 F-measure.
  • The study developed a dictionary of 691,000 protein names to assist in tagging.
  • Revised guidelines improved inter-coder reliability metrics significantly.

Takeaway

This study created rules to help people tag protein names in research papers better, making it easier to find and use this information.

Methodology

The study involved tagging protein names in 300 MEDLINE abstracts by multiple coders and assessing inter-coder reliability.

Potential Biases

Potential bias due to the subjective nature of tagging and the varying expertise of coders.

Limitations

The guidelines may not cover all naming conventions and the complexity of protein names can lead to inconsistencies.

Participant Demographics

Coders included biologists and researchers with varying levels of experience in the field.

Digital Object Identifier (DOI)

10.1002/cfg.452

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication