Data-poor categorization and passage retrieval for Gene Ontology Annotation in Swiss-Prot
2005

Improving Gene Ontology Annotation with Text Mining

Sample size: 640 publication Evidence: moderate

Author Information

Author(s): Ehrler Frédéric, Geissbühler Antoine, Jimeno Antonio, Ruch Patrick

Primary Institution: University of Geneva

Hypothesis

Can text mining methods effectively categorize and retrieve passages for Gene Ontology Annotation in data-poor conditions?

Conclusion

The developed system achieved competitive performance in passage retrieval and text categorization, suggesting it could benefit various information extraction tasks.

Supporting Evidence

  • The system achieved the best recall and precision combination for passage retrieval and text categorization.
  • Text categorization results were far below those in other data-poor text categorization experiments.
  • The top proposed term was relevant in less than 20% of cases.

Takeaway

This study shows how computers can help scientists find the right information about proteins, even when there's not a lot of data to work with.

Methodology

The study used a classifier to compute distances between sentences and Gene Ontology categories, evaluating performance based on precision and recall.

Limitations

The text categorization results were significantly lower than those in other data-rich experiments, indicating a need for better methods in data-poor scenarios.

Digital Object Identifier (DOI)

10.1186/1471-2105-6-S1-S23

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication