Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers
2010

Unsupervised Binning of Environmental Genomic Fragments

publication Evidence: high

Author Information

Author(s): Yang Bin, Peng Yu, Leung Henry Chi-Ming, Yiu Siu-Ming, Chen Jing-Chi, Chin Francis Yuk-Lun

Primary Institution: State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University

Hypothesis

Can an unsupervised method effectively bin DNA fragments without using reference datasets?

Conclusion

The proposed unsupervised binning method accurately classifies DNA fragments without relying on reference genomes or marker information.

Supporting Evidence

  • The method can bin DNA fragments with various lengths and species ratios without reference datasets.
  • Binning accuracy decreases by less than 1% with sequencing error rates up to 5%.
  • The method is robust against sequencing errors, maintaining high accuracy even at 5% error rates.

Takeaway

This study presents a new way to group DNA pieces from different species without needing to know anything about them beforehand.

Methodology

The method uses l-mer frequency distributions and a modified Chebychev distance for clustering DNA fragments.

Potential Biases

Potential biases may arise from the selection of l-mers and the clustering algorithm used.

Limitations

The method's performance may vary with different species complexities and DNA fragment lengths.

Digital Object Identifier (DOI)

10.1186/1471-2105-11-S2-S5

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication