Improving Geocoded Data Quality in Health Research
Author Information
Author(s): Daniel W. Goldberg, John P. Wilson, Craig A. Knoblock, Beate Ritz, Myles G. Cockburn
Primary Institution: University of Southern California
Hypothesis
What is the most cost-effective method for improving geocoded data quality in health-related datasets?
Conclusion
Manual geocode correction is a feasible and cost-effective method for improving the quality of geocoded data.
Supporting Evidence
- Geocode correction improved the overall match rate from 79.3% to 95%.
- 12,280 records (55%) were successfully improved through manual correction.
- The average processing time per record was 69 seconds.
- Spatial shifts averaged 9.9 km between original and corrected geocodes.
- Building centroid accuracy geocodes increased from 0 to 2,261.
Takeaway
This study shows that fixing location data by hand can make it much better and is worth the time spent.
Methodology
The study involved manually correcting geocodes in five health-related datasets using a web-based interactive approach.
Potential Biases
Potential bias due to reliance on the accuracy of the Google Maps API and the initial geocoding process.
Limitations
The study lacks ground truth data for the addresses and relies on the accuracy of the Google Maps API.
Participant Demographics
Participants included four full-time staff and three volunteer graduate students.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website