Cascaded Classifiers for Chemical Named Entity Recognition
Author Information
Author(s): Corbett Peter, Copestake Ann
Primary Institution: University Of Cambridge
Hypothesis
Can a system be developed to effectively recognize chemical named entities using confidence-based extraction methods?
Conclusion
The study demonstrates that chemical named entities can be extracted with good performance, and the extraction properties can be adjusted to meet specific task requirements.
Supporting Evidence
- The system achieved an F score of 80.7% from chemistry papers and 83.2% from PubMed abstracts.
- At a threshold for balanced precision and recall, the system achieved 57.6% recall at 95% precision.
- The system can be tuned for high precision or high recall based on the application's needs.
Takeaway
This study created a system that helps computers find names of chemicals in texts, and it can be adjusted to be really good at either finding all names or just the most important ones.
Methodology
The system uses character-based n-grams and Maximum Entropy Markov Models to recognize chemical names and assign confidence scores.
Limitations
The system may struggle with certain chemical nomenclature aspects and relies on the quality of the training data.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website