Large-scale prediction of long disordered regions in proteins using random forests
2009

Predicting Long Disordered Regions in Proteins Using Random Forests

Sample size: 352 publication Evidence: high

Author Information

Author(s): Han Pengfei, Zhang Xiuzhen, Norton Raymond S, Feng Zhi-Ping

Primary Institution: RMIT University

Hypothesis

Can a new algorithm effectively predict long disordered regions in proteins?

Conclusion

The IUPforest-L algorithm can accurately predict long disordered regions in proteins using a random forest model.

Supporting Evidence

  • IUPforest-L achieved an area of 89.5% under the ROC curve in 10-fold cross validation tests.
  • The algorithm outperformed existing predictors in blind tests.
  • IUPforest-L is efficient for large-scale proteome studies.

Takeaway

Scientists created a new tool to help find parts of proteins that don't have a fixed shape but are still important for their function.

Methodology

The study used a random forest model trained on amino acid indices and physicochemical features to predict long disordered regions.

Limitations

The algorithm may not perform as well on short disordered regions and relies on the quality of training data.

Statistical Information

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.1186/1471-2105-10-8

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication