Cascaded classifiers for confidence-based chemical named entity recognition
2008

Cascaded Classifiers for Chemical Named Entity Recognition

Sample size: 42 publication Evidence: moderate

Author Information

Author(s): Corbett Peter, Copestake Ann

Primary Institution: University Of Cambridge

Hypothesis

Can a system be developed to effectively recognize chemical named entities using confidence-based extraction methods?

Conclusion

The study demonstrates that chemical named entities can be extracted with good performance, and the extraction properties can be adjusted to meet specific task requirements.

Supporting Evidence

  • The system achieved an F score of 80.7% from chemistry papers and 83.2% from PubMed abstracts.
  • At a threshold for balanced precision and recall, the system achieved 57.6% recall at 95% precision.
  • The system can be tuned for high precision or high recall based on the application's needs.

Takeaway

This study created a system that helps computers find names of chemicals in texts, and it can be adjusted to be really good at either finding all names or just the most important ones.

Methodology

The system uses character-based n-grams and Maximum Entropy Markov Models to recognize chemical names and assign confidence scores.

Limitations

The system may struggle with certain chemical nomenclature aspects and relies on the quality of the training data.

Digital Object Identifier (DOI)

10.1186/1471-2105-9-S11-S4

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication