Automated De-identification of Free-text Medical Records
Author Information
Author(s): Neamatullah Ishna, Douglass Margaret M, Lehman Li-wei H, Reisner Andrew, Villarroel Mauricio, Long William J, Szolovits Peter, Moody George B, Mark Roger G, Clifford Gari D
Primary Institution: Massachusetts Institute of Technology
Hypothesis
Can automated methods effectively de-identify free-text medical records to preserve patient confidentiality?
Conclusion
The developed de-identification system outperforms human de-identifiers and is suitable for various free-text medical records.
Supporting Evidence
- The algorithm achieved an overall recall of 0.967 and precision of 0.749.
- On the test corpus, the algorithm's estimated recall was 0.943.
- The software was able to de-identify all patient names in the gold standard corpus.
Takeaway
This study created a computer program that can automatically remove personal information from medical records so that patients' identities stay private.
Methodology
The study used a Perl-based software that employs lexical look-up tables, regular expressions, and heuristics to identify and remove protected health information from medical records.
Potential Biases
The algorithm may miss some PHI due to reliance on dictionaries and context rules.
Limitations
The algorithm's accuracy is high but may not be sufficient for public dissemination of medical data.
Participant Demographics
The study involved nursing notes from 163 randomly selected patients.
Statistical Information
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website