New Method for Analyzing GeneChip Data
Author Information
Author(s): Simon Katz, Rafael A. Irizarry, Xue Lin, Mark Tripputi, Mark W. Porter
Primary Institution: Gene Logic Inc.
Hypothesis
Can a biologically diverse reference database improve the summarization of Affymetrix GeneChip data?
Conclusion
A biologically diverse reference database can effectively train a model for estimating probe set intensities while maintaining the characteristics of the original algorithm.
Supporting Evidence
- The refRMA workflow produces similar data characteristics to Classic RMA.
- The model can be applied to naïve organ types and benchmark data with respectable results.
- The training set included a balanced pool of over 6,000 samples.
Takeaway
This study shows that using a big collection of different samples can help analyze gene data better, even if the new samples are different from the ones used to create the model.
Methodology
The study developed a new version of the Robust Multi-chip Averaging (RMA) algorithm called refRMA, which uses a large training set for summarizing individual arrays.
Potential Biases
Potential bias due to the reliance on a specific training set and the variability in sample preparation.
Limitations
The refRMA model may not perform as well on heavily manipulated data or highly tissue-specific probe sets compared to Classic RMA.
Participant Demographics
The training set included samples from 144 different organ types across four pathology states.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website