Errors from Removing Duplicate Ditags in LongSAGE Analysis

Sample size: 10 publication Evidence: high

Author Information

Author(s): Emmersen Jeppe, Heidenblut Anna M, Høgh Annabeth Laursen, Hahn Stephan A, Welinder Karen G, Nielsen Kåre L

Primary Institution: Aalborg University

Hypothesis

Removing all duplicate ditags in LongSAGE analysis may introduce significant measurement errors.

Conclusion

Removing all duplicate ditags leads to large errors in LongSAGE datasets, which may affect the interpretation of gene expression data.

Supporting Evidence

The algorithm identified individual artifact ditags that originated from rare nucleotide variations.
Analysis showed that removing duplicates could lead to errors of up to 3 fold in LongSAGE.
The study found that the removal of duplicates significantly affected the tag counts of abundant transcripts.

Takeaway

When scientists analyze genes, they sometimes throw away duplicate pieces of information, but this can actually make their results worse instead of better.

Methodology

An algorithm was developed to analyze the occurrence of SAGE tags in different ditag combinations across multiple LongSAGE libraries.

Potential Biases

Potential bias from discarding naturally occurring duplicate ditags could lead to misinterpretation of gene expression.

Limitations

The study primarily focused on pancreatic acinar cells and may not generalize to all LongSAGE datasets.

Participant Demographics

The study analyzed LongSAGE libraries derived from pancreatic acinar cells.

Statistical Information

P-Value

p<0.05

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.1186/1471-2105-8-92

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication

Home