A critical assessment of text mining methods in molecular biology
2005

Assessment of Text Mining Methods for Gene Lists

Sample size: 5000 publication Evidence: moderate

Author Information

Author(s): Lynette Hirschman, Marc Colosimo, Alexander Morgan, Alexander Yeh

Primary Institution: The MITRE Corporation

Hypothesis

Can automated systems effectively generate normalized gene lists from biological texts?

Conclusion

The study shows that various systems can successfully perform gene normalization tasks across different organisms, with performance varying based on organism-specific naming conventions.

Supporting Evidence

  • The top scoring system for Yeast achieved an F-measure of 0.92.
  • For Fly, the best F-measure was 0.82, while for Mouse it was 0.79.
  • The performance varied significantly based on the organism and its naming conventions.

Takeaway

This study looked at how well computers can find and list genes mentioned in scientific papers. It found that some types of genes are easier to identify than others.

Methodology

The study involved comparing systems that generated gene lists from abstracts of scientific papers, using a gold standard for evaluation.

Limitations

The study faced challenges with the quality of training data and the complexity of gene naming conventions.

Digital Object Identifier (DOI)

10.1186/1471-2105-6-S1-S11

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication