A Framework for Finding Gene Identifiers in Documents
Author Information
Author(s): William W. Cohen, Einat Minkov
Primary Institution: Carnegie Mellon University
Hypothesis
Can a graph-based method improve the accuracy of gene identifier ranking in biological literature?
Conclusion
Combining multiple named entity recognition systems in a graph-based approach significantly enhances the accuracy of gene identifier ranking.
Supporting Evidence
- The graph-based approach outperformed any of its component NER systems.
- Mean average precision improved by nearly 80% over the best-performing single NER system.
- The study utilized data from the BioCreAtIvE challenge to evaluate the methods.
Takeaway
This study shows how computers can help scientists find the names of genes mentioned in research papers by using smart methods to rank them.
Methodology
The study used a graph-based approach to combine outputs from multiple named entity recognition systems and evaluated their performance on datasets from the BioCreAtIvE challenge.
Potential Biases
The study may not account for the variability in gene nomenclature across different model organisms.
Limitations
The performance may be overstated due to simplifications in the evaluation datasets, which were abstracts rather than full papers.
Participant Demographics
The study focused on mouse-relevant MEDLINE abstracts.
Statistical Information
P-Value
p<0.005
Statistical Significance
p<0.005
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website