PCA-based population structure inference with generic clustering algorithms
2009

Using PCA for Population Structure Inference

Sample size: 1064 publication Evidence: moderate

Author Information

Author(s): Lee Chih, Abdool Ali, Huang Chun-Hsi

Primary Institution: University of Connecticut

Hypothesis

Can PCA and generic clustering algorithms effectively infer population structure from genotype data?

Conclusion

The proposed PCA-based approach is faster and scalable compared to the traditional STRUCTURE algorithm for population structure inference.

Supporting Evidence

  • PCA reduced the number of variables from around 5,000 to at most 70.
  • Soft K-means performed comparably well to STRUCTURE on the distant dataset.
  • The BIC score produced identical predictions to STRUCTURE on simulated datasets.

Takeaway

This study shows that using PCA can help group people based on their genetic data quickly, which is useful for understanding population structures.

Methodology

The study used PCA to reduce genotype data dimensions and applied K-means, soft K-means, and spectral clustering algorithms to infer population structure.

Potential Biases

The choice of p-value for selecting significant PCs may introduce bias in the clustering results.

Limitations

The study's results may be affected by noisy and non-informative principal components.

Participant Demographics

The study included 1,064 individuals from 51 populations.

Statistical Information

P-Value

0.05

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.1186/1471-2105-10-S1-S73

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication