A New-Fangled FES-k-Means Clustering Algorithm for Disease Discovery and Visual Analytics
Author Information
Author(s): Tonny J. Oyana
Primary Institution: Southern Illinois University
Hypothesis
The linear-like pattern of elevated blood lead levels discovered in the city of Chicago may be spatially linked to the city's water service lines.
Conclusion
The FES-k-means algorithm produces clusters similar to the original k-means method at a much faster rate and provides efficient analysis of large geospatial data.
Supporting Evidence
- The FES-k-means algorithm allows for efficient analysis of large geospatial data.
- It produces clusters similar to the original k-means method at a much faster rate.
- The study identified a robust pattern of elevated blood lead levels among children that was missed in previous analyses.
Takeaway
This study created a new way to group data that helps find patterns in health data faster and better, especially for understanding diseases.
Methodology
The study tested the FES-k-means algorithm on two real datasets and one synthetic dataset using a two-step approach of data training prior to clustering.
Limitations
The algorithm is limited to handling point data and may not effectively cluster other data types such as lines or polygons.
Participant Demographics
The datasets included georeferenced data on adult asthma in Buffalo, New York, and elevated blood lead levels linked with housing unit ages in Chicago.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website