A comparison of random forests, boosting and support vector machines for genomic selection
2011

Comparing Machine Learning Methods for Genomic Selection

Sample size: 3226 publication Evidence: moderate

Author Information

Author(s): Ogutu Joseph O, Piepho Hans-Peter, Schulz-Streeck Torben

Primary Institution: University of Hohenheim

Hypothesis

Which machine learning method provides the best predictive accuracy for genomic breeding values?

Conclusion

Boosting showed the highest accuracy for predicting genomic breeding values, followed closely by support vector machines and random forests.

Supporting Evidence

  • Boosting achieved a correlation of 0.547 with true breeding values.
  • Support vector machines had a correlation of 0.497.
  • Random forests had the lowest correlation at 0.483.

Takeaway

This study looked at different computer methods to predict how well plants and animals will breed based on their genes, finding that one method worked best.

Methodology

The study used simulated data to predict genomic breeding values using random forests, boosting, and support vector machines, measuring accuracy with Pearson correlation and 5-fold cross-validation.

Limitations

The simulated data may not fully capture complex interactions between markers, which could affect the performance of the methods.

Participant Demographics

The dataset included 3226 individuals across five generations, with 2326 phenotyped and genotyped.

Digital Object Identifier (DOI)

10.1186/1753-6561-5-S3-S11

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication