Mining Protein Function from Text Using Support Vector Machines
Author Information
Author(s): Simon B Rice, Goran Nenadic, Benjamin J Stapley
Primary Institution: University of Manchester
Hypothesis
Can a supervised machine-learning approach effectively assign Gene Ontology terms to human proteins based on text mining?
Conclusion
A machine learning approach to mining protein function predictions from text can yield good performance only if sufficient training data is available, and significant amount of supporting data is used for prediction.
Supporting Evidence
- The study evaluated the performance of text mining systems in assigning Gene Ontology terms to proteins.
- Results showed that the method performed better with a larger set of relevant documents.
- Precision of selected supporting text was variable, ranging from 3% to 50%.
Takeaway
This study shows that using lots of documents helps computers figure out what proteins do by reading about them, but they struggle when there's not enough information.
Methodology
A supervised machine learning approach using support vector machines to assign Gene Ontology terms to proteins based on co-occurring terms extracted from documents.
Limitations
The method works poorly on single documents and short passages, and the performance is highly dependent on the availability of relevant training data.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website