Assessment of Text Mining Methods for Gene Lists
Author Information
Author(s): Lynette Hirschman, Marc Colosimo, Alexander Morgan, Alexander Yeh
Primary Institution: The MITRE Corporation
Hypothesis
Can automated systems effectively generate normalized gene lists from biological texts?
Conclusion
The study shows that various systems can successfully perform gene normalization tasks across different organisms, with performance varying based on organism-specific naming conventions.
Supporting Evidence
- The top scoring system for Yeast achieved an F-measure of 0.92.
- For Fly, the best F-measure was 0.82, while for Mouse it was 0.79.
- The performance varied significantly based on the organism and its naming conventions.
Takeaway
This study looked at how well computers can find and list genes mentioned in scientific papers. It found that some types of genes are easier to identify than others.
Methodology
The study involved comparing systems that generated gene lists from abstracts of scientific papers, using a gold standard for evaluation.
Limitations
The study faced challenges with the quality of training data and the complexity of gene naming conventions.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website