A summarization approach for Affymetrix GeneChip data using a reference training set from a large, biologically diverse database

2006

New Method for Analyzing GeneChip Data

Sample size: 1614 publication Evidence: moderate

Author Information

Author(s): Simon Katz, Rafael A. Irizarry, Xue Lin, Mark Tripputi, Mark W. Porter

Primary Institution: Gene Logic Inc.

Hypothesis

Can a biologically diverse reference database improve the summarization of Affymetrix GeneChip data?

Conclusion

A biologically diverse reference database can effectively train a model for estimating probe set intensities while maintaining the characteristics of the original algorithm.

Supporting Evidence

The refRMA workflow produces similar data characteristics to Classic RMA.
The model can be applied to naïve organ types and benchmark data with respectable results.
The training set included a balanced pool of over 6,000 samples.

Takeaway

This study shows that using a big collection of different samples can help analyze gene data better, even if the new samples are different from the ones used to create the model.

Methodology

The study developed a new version of the Robust Multi-chip Averaging (RMA) algorithm called refRMA, which uses a large training set for summarizing individual arrays.

Potential Biases

Potential bias due to the reliance on a specific training set and the variability in sample preparation.

Limitations

The refRMA model may not perform as well on heavily manipulated data or highly tissue-specific probe sets compared to Classic RMA.

Participant Demographics

The training set included samples from 144 different organ types across four pathology states.

Digital Object Identifier (DOI)

10.1186/1471-2105-7-464

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication

Home