Re-identification of home addresses from spatial locations anonymized by Gaussian skew
2008

Re-identification of Home Addresses from Anonymized Spatial Data

Sample size: 10000 publication Evidence: high

Author Information

Author(s): Cassa Christopher A, Wieland Shannon C, Mandl Kenneth D

Primary Institution: Children's Hospital Informatics Program, Children's Hospital Boston

Hypothesis

Can multiple anonymized versions of the same data set be used to re-identify original geographic locations?

Conclusion

Multiple versions of the same data, each anonymized by Gaussian skew, can be used to ascertain original geographic locations.

Supporting Evidence

  • With ten anonymized copies, the average distance from the re-identified address to the original decreased from 0.7 km to 0.2 km.
  • With fifty anonymized copies, the average distance decreased from 0.7 km to 0.1 km.
  • The study demonstrates that averaging multiple anonymized data sets can significantly weaken privacy protections.

Takeaway

If you have many copies of a secret address that have been mixed up a little, you can still figure out where the real address is.

Methodology

The study created 10,000 geocoded patient addresses and anonymized them using Gaussian and uniform skew methods, averaging results to assess re-identification risk.

Potential Biases

The risk of bias may arise from the specific methods of anonymization used and the assumptions made about data access.

Limitations

The study primarily focuses on two anonymization methods and may not account for other potential vulnerabilities.

Participant Demographics

The study used artificially-generated geocoded values for patients in Boston, MA.

Digital Object Identifier (DOI)

10.1186/1476-072X-7-45

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication