Large-scale prediction of long disordered regions in proteins using random forests
2009
Predicting Long Disordered Regions in Proteins Using Random Forests
Sample size: 352
publication
Evidence: high
Author Information
Author(s): Han Pengfei, Zhang Xiuzhen, Norton Raymond S, Feng Zhi-Ping
Primary Institution: RMIT University
Hypothesis
Can a new algorithm effectively predict long disordered regions in proteins?
Conclusion
The IUPforest-L algorithm can accurately predict long disordered regions in proteins using a random forest model.
Supporting Evidence
- IUPforest-L achieved an area of 89.5% under the ROC curve in 10-fold cross validation tests.
- The algorithm outperformed existing predictors in blind tests.
- IUPforest-L is efficient for large-scale proteome studies.
Takeaway
Scientists created a new tool to help find parts of proteins that don't have a fixed shape but are still important for their function.
Methodology
The study used a random forest model trained on amino acid indices and physicochemical features to predict long disordered regions.
Limitations
The algorithm may not perform as well on short disordered regions and relies on the quality of training data.
Statistical Information
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website