A Framework for Finding Gene Identifiers in Documents

Sample size: 400 publication 10 minutes Evidence: high

Author Information

Author(s): William W. Cohen, Einat Minkov

Primary Institution: Carnegie Mellon University

Hypothesis

Can a graph-based method improve the accuracy of gene identifier ranking in biological literature?

Conclusion

Combining multiple named entity recognition systems in a graph-based approach significantly enhances the accuracy of gene identifier ranking.

Supporting Evidence

The graph-based approach outperformed any of its component NER systems.
Mean average precision improved by nearly 80% over the best-performing single NER system.
The study utilized data from the BioCreAtIvE challenge to evaluate the methods.

Takeaway

This study shows how computers can help scientists find the names of genes mentioned in research papers by using smart methods to rank them.

Methodology

The study used a graph-based approach to combine outputs from multiple named entity recognition systems and evaluated their performance on datasets from the BioCreAtIvE challenge.

Potential Biases

The study may not account for the variability in gene nomenclature across different model organisms.

Limitations

The performance may be overstated due to simplifications in the evaluation datasets, which were abstracts rather than full papers.

Participant Demographics

The study focused on mouse-relevant MEDLINE abstracts.

Statistical Information

P-Value

p<0.005

Statistical Significance

p<0.005

Digital Object Identifier (DOI)

10.1186/1471-2105-7-440

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication

Home

Previous Next