Improving Gene Expression Clustering with Measurement Error
Author Information
Author(s): Liu Xuejun, Lin Kevin K, Andersen Bogi, Rattray Magnus
Primary Institution: Nanjing University of Aeronautics and Astronautics
Hypothesis
Including probe-level measurement error in clustering models will enhance the clustering performance of gene expression data.
Conclusion
The performance of model-based clustering of gene expression data is improved by including probe-level measurement error and more biologically meaningful clustering results are obtained.
Supporting Evidence
- The inclusion of probe-level measurement error significantly improved clustering performance on simulated datasets.
- PUMA-CLUST outperformed standard clustering methods in terms of adjusted Rand index.
- Biologically meaningful clusters were identified more frequently using PUMA-CLUST compared to MCLUST.
Takeaway
This study shows that when scientists group genes based on their activity, considering the tiny errors in measurements helps them do a better job.
Methodology
The study used an augmented Gaussian mixture model that incorporates probe-level measurement error to improve clustering performance.
Potential Biases
Potential biases may arise from the specific datasets used and the assumptions made in the model.
Limitations
The study primarily focuses on simulated datasets and a specific real-world dataset, which may limit the generalizability of the findings.
Participant Demographics
The study analyzed gene expression data from a mouse time-course dataset.
Statistical Information
P-Value
p<0.05
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website