Errors from Removing Duplicate Ditags in LongSAGE Analysis
Author Information
Author(s): Emmersen Jeppe, Heidenblut Anna M, Høgh Annabeth Laursen, Hahn Stephan A, Welinder Karen G, Nielsen Kåre L
Primary Institution: Aalborg University
Hypothesis
Removing all duplicate ditags in LongSAGE analysis may introduce significant measurement errors.
Conclusion
Removing all duplicate ditags leads to large errors in LongSAGE datasets, which may affect the interpretation of gene expression data.
Supporting Evidence
- The algorithm identified individual artifact ditags that originated from rare nucleotide variations.
- Analysis showed that removing duplicates could lead to errors of up to 3 fold in LongSAGE.
- The study found that the removal of duplicates significantly affected the tag counts of abundant transcripts.
Takeaway
When scientists analyze genes, they sometimes throw away duplicate pieces of information, but this can actually make their results worse instead of better.
Methodology
An algorithm was developed to analyze the occurrence of SAGE tags in different ditag combinations across multiple LongSAGE libraries.
Potential Biases
Potential bias from discarding naturally occurring duplicate ditags could lead to misinterpretation of gene expression.
Limitations
The study primarily focused on pancreatic acinar cells and may not generalize to all LongSAGE datasets.
Participant Demographics
The study analyzed LongSAGE libraries derived from pancreatic acinar cells.
Statistical Information
P-Value
p<0.05
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website