A scalable machine-learning approach to recognize chemical names within large text databases

2006

Machine Learning for Recognizing Chemical Names in Text

Sample size: 13100000 publication Evidence: high

Author Information

Author(s): Wren Jonathan D

Primary Institution: The University of Oklahoma

A first-order Markov Model (MM) could be used to effectively discern chemical names.

The study demonstrated that a Markov Model can accurately recognize chemical names within large text databases with high precision and recall rates.

The Markov Model achieved ~93% recall and ~99% precision on smaller test sets.
The method processed 13.1 million MEDLINE records with an average precision of 82.7%.
The study found that the number of spelling variants for a chemical name correlates with its frequency in literature.

This study shows that a computer program can learn to find chemical names in a lot of text, helping scientists organize information better.

A first-order Markov Model was trained on chemical names and tested on MEDLINE records to evaluate its performance in recognizing chemical terms.

The model may overestimate the number of unique chemical names due to 'tag-along' prefixes and suffixes, and it struggles with short terms.

Access the complete publication on the publisher's website