Automatic Abbreviation Definition Identification
Author Information
Author(s): Sohn Sunghwan, Comeau Donald C, Kim Won, Wilbur W John
Primary Institution: National Centre for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Hypothesis
An automatic way to estimate the accuracy of abbreviation-definition pairs extracted from text is needed.
Conclusion
We developed an algorithm for abbreviation identification that uses a variety of strategies to identify the most probable definition for an abbreviation and also produces an estimated accuracy of the result.
Supporting Evidence
- The algorithm produced 97% precision and 85% recall on the Medstract corpus.
- On a manually annotated set of 1250 MEDLINE records, the algorithm achieved 96.5% precision and 83.2% recall.
- The algorithm identifies abbreviation-definition pairs without requiring human judgment.
Takeaway
This study created a computer program that helps find the meanings of short forms used in medical texts, making it easier to understand them.
Methodology
The algorithm employs various strategies to identify abbreviation-definition pairs and estimates their accuracy using pseudo-precision.
Limitations
The algorithm may miss pairs with unmatched characters in the short form or require skipping more than one non-stopword between words in the long form.
Statistical Information
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website