A critical assessment of text mining methods in molecular biology
2005

Evaluation of BioCreAtIvE Assessment of Task 2

Sample size: 15 publication Evidence: moderate

Author Information

Author(s): Christian Blaschke, E. A. Leon, Martin Krallinger, Alfonso Valencia

Primary Institution: National Center of Biotechnology, CNB-CSIC

Hypothesis

How effectively can text mining tools extract Gene Ontology annotations from full text articles?

Conclusion

Text mining tools show promise in extracting Gene Ontology annotations, but they still fall short of the performance needed for practical applications.

Supporting Evidence

  • More than 15,000 individual results were provided by the participants.
  • The dataset generated is publicly available for future training of information extraction methods.
  • Three main strategies were identified among participants: pattern matching, machine learning, and hybrid approaches.

Takeaway

This study looked at how well computers can read scientific papers to find information about proteins. They found that while computers are getting better at this, they still need to improve.

Methodology

The study involved a community-wide competition where participants submitted results for the automatic extraction of Gene Ontology annotations from full text articles.

Limitations

The lack of a high-quality training set and the complexity of GO terms made the task challenging.

Participant Demographics

Nine teams participated in the evaluation.

Digital Object Identifier (DOI)

10.1186/1471-2105-6-S1-S16

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication