Comparing Machine Learning Methods for Genomic Selection
Author Information
Author(s): Ogutu Joseph O, Piepho Hans-Peter, Schulz-Streeck Torben
Primary Institution: University of Hohenheim
Hypothesis
Which machine learning method provides the best predictive accuracy for genomic breeding values?
Conclusion
Boosting showed the highest accuracy for predicting genomic breeding values, followed closely by support vector machines and random forests.
Supporting Evidence
- Boosting achieved a correlation of 0.547 with true breeding values.
- Support vector machines had a correlation of 0.497.
- Random forests had the lowest correlation at 0.483.
Takeaway
This study looked at different computer methods to predict how well plants and animals will breed based on their genes, finding that one method worked best.
Methodology
The study used simulated data to predict genomic breeding values using random forests, boosting, and support vector machines, measuring accuracy with Pearson correlation and 5-fold cross-validation.
Limitations
The simulated data may not fully capture complex interactions between markers, which could affect the performance of the methods.
Participant Demographics
The dataset included 3226 individuals across five generations, with 2326 phenotyped and genotyped.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website