Automated De-identification of Free-text Medical Records

Sample size: 2434 publication Evidence: high

Author Information

Author(s): Neamatullah Ishna, Douglass Margaret M, Lehman Li-wei H, Reisner Andrew, Villarroel Mauricio, Long William J, Szolovits Peter, Moody George B, Mark Roger G, Clifford Gari D

Primary Institution: Massachusetts Institute of Technology

Hypothesis

Can automated methods effectively de-identify free-text medical records to preserve patient confidentiality?

Conclusion

The developed de-identification system outperforms human de-identifiers and is suitable for various free-text medical records.

Supporting Evidence

The algorithm achieved an overall recall of 0.967 and precision of 0.749.
On the test corpus, the algorithm's estimated recall was 0.943.
The software was able to de-identify all patient names in the gold standard corpus.

Takeaway

This study created a computer program that can automatically remove personal information from medical records so that patients' identities stay private.

Methodology

The study used a Perl-based software that employs lexical look-up tables, regular expressions, and heuristics to identify and remove protected health information from medical records.

Potential Biases

The algorithm may miss some PHI due to reliance on dictionaries and context rules.

Limitations

The algorithm's accuracy is high but may not be sufficient for public dissemination of medical data.

Participant Demographics

The study involved nursing notes from 163 randomly selected patients.

Statistical Information

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.1186/1472-6947-8-32

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication

Home