ParaKMeans: Implementation of a parallelized K-means algorithm suitable for general laboratory use

2008

ParaKMeans: A Fast K-means Clustering Software for Laboratories

publication 10 minutes Evidence: high

Author Information

Author(s): Kraj Piotr, Sharma Ashok, Garge Nikhil, Podolsky Robert, McIndoe Richard A

Primary Institution: Center for Biotechnology and Genomic Medicine, Medical College of Georgia, Augusta, GA USA

Can a parallelized K-means algorithm improve clustering performance for large datasets in laboratory settings?

ParaKMeans significantly speeds up clustering of large datasets and is user-friendly for laboratory use.

ParaKMeans provides significant performance gains over a wide range of datasets using as little as seven nodes.
The average time taken to cluster each dataset was reduced from 24.33 minutes to 3.03 minutes using 7 nodes.
ParaKMeans was significantly faster than the Cluster program in all tested combinations of genes and arrays.

ParaKMeans is a computer program that helps scientists group similar data together much faster by using many computers at once.

The software implements a parallelized K-means clustering algorithm using a client-server model with web services for distance calculations.

The performance may vary based on the number of compute nodes and the size of the dataset.

p<0.0001

p<0.05

Access the complete publication on the publisher's website