A summarization approach for Affymetrix GeneChip data using a reference training set from a large, biologically diverse database
2006

New Method for Analyzing GeneChip Data

Sample size: 1614 publication Evidence: moderate

Author Information

Author(s): Simon Katz, Rafael A. Irizarry, Xue Lin, Mark Tripputi, Mark W. Porter

Primary Institution: Gene Logic Inc.

Hypothesis

Can a biologically diverse reference database improve the summarization of Affymetrix GeneChip data?

Conclusion

A biologically diverse reference database can effectively train a model for estimating probe set intensities while maintaining the characteristics of the original algorithm.

Supporting Evidence

  • The refRMA workflow produces similar data characteristics to Classic RMA.
  • The model can be applied to naïve organ types and benchmark data with respectable results.
  • The training set included a balanced pool of over 6,000 samples.

Takeaway

This study shows that using a big collection of different samples can help analyze gene data better, even if the new samples are different from the ones used to create the model.

Methodology

The study developed a new version of the Robust Multi-chip Averaging (RMA) algorithm called refRMA, which uses a large training set for summarizing individual arrays.

Potential Biases

Potential bias due to the reliance on a specific training set and the variability in sample preparation.

Limitations

The refRMA model may not perform as well on heavily manipulated data or highly tissue-specific probe sets compared to Classic RMA.

Participant Demographics

The training set included samples from 144 different organ types across four pathology states.

Digital Object Identifier (DOI)

10.1186/1471-2105-7-464

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication