Abbreviation definition identification based on automatic precision estimates
2008

Automatic Abbreviation Definition Identification

Sample size: 1250 publication Evidence: high

Author Information

Author(s): Sohn Sunghwan, Comeau Donald C, Kim Won, Wilbur W John

Primary Institution: National Centre for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA

Hypothesis

An automatic way to estimate the accuracy of abbreviation-definition pairs extracted from text is needed.

Conclusion

We developed an algorithm for abbreviation identification that uses a variety of strategies to identify the most probable definition for an abbreviation and also produces an estimated accuracy of the result.

Supporting Evidence

  • The algorithm produced 97% precision and 85% recall on the Medstract corpus.
  • On a manually annotated set of 1250 MEDLINE records, the algorithm achieved 96.5% precision and 83.2% recall.
  • The algorithm identifies abbreviation-definition pairs without requiring human judgment.

Takeaway

This study created a computer program that helps find the meanings of short forms used in medical texts, making it easier to understand them.

Methodology

The algorithm employs various strategies to identify abbreviation-definition pairs and estimates their accuracy using pseudo-precision.

Limitations

The algorithm may miss pairs with unmatched characters in the short form or require skipping more than one non-stopword between words in the long form.

Statistical Information

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.1186/1471-2105-9-402

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication