Machine Learning for Recognizing Chemical Names in Text
Author Information
Author(s): Wren Jonathan D
Primary Institution: The University of Oklahoma
Hypothesis
A first-order Markov Model (MM) could be used to effectively discern chemical names.
Conclusion
The study demonstrated that a Markov Model can accurately recognize chemical names within large text databases with high precision and recall rates.
Supporting Evidence
- The Markov Model achieved ~93% recall and ~99% precision on smaller test sets.
- The method processed 13.1 million MEDLINE records with an average precision of 82.7%.
- The study found that the number of spelling variants for a chemical name correlates with its frequency in literature.
Takeaway
This study shows that a computer program can learn to find chemical names in a lot of text, helping scientists organize information better.
Methodology
A first-order Markov Model was trained on chemical names and tested on MEDLINE records to evaluate its performance in recognizing chemical terms.
Limitations
The model may overestimate the number of unique chemical names due to 'tag-along' prefixes and suffixes, and it struggles with short terms.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website