Data-poor categorization and passage retrieval for Gene Ontology Annotation in Swiss-Prot

2005

Improving Gene Ontology Annotation with Text Mining

Sample size: 640 publication Evidence: moderate

Author Information

Author(s): Ehrler Frédéric, Geissbühler Antoine, Jimeno Antonio, Ruch Patrick

Primary Institution: University of Geneva

Hypothesis

Can text mining methods effectively categorize and retrieve passages for Gene Ontology Annotation in data-poor conditions?

Conclusion

The developed system achieved competitive performance in passage retrieval and text categorization, suggesting it could benefit various information extraction tasks.

Supporting Evidence

The system achieved the best recall and precision combination for passage retrieval and text categorization.
Text categorization results were far below those in other data-poor text categorization experiments.
The top proposed term was relevant in less than 20% of cases.

Takeaway

This study shows how computers can help scientists find the right information about proteins, even when there's not a lot of data to work with.

Methodology

The study used a classifier to compute distances between sentences and Gene Ontology categories, evaluating performance based on precision and recall.

Limitations

The text categorization results were significantly lower than those in other data-rich experiments, indicating a need for better methods in data-poor scenarios.

Digital Object Identifier (DOI)

10.1186/1471-2105-6-S1-S23

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication

Home

Previous Next