ChemicalTagger: A Tool for Semantic Text-Mining in Chemistry
Author Information
Author(s): Hawizy Lezan, Jessop David M, Adams Nico, Murray-Rust Peter
Primary Institution: Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge
Hypothesis
Can structured scientific data be extracted from unstructured scientific literature using text-mining techniques?
Conclusion
ChemicalTagger can effectively parse chemical experimental text and has been successfully deployed to identify solvents with over 99.5% precision.
Supporting Evidence
- ChemicalTagger achieved machine-annotator agreements of 88.9% for phrase recognition.
- The tool has been deployed for over 10,000 patents.
- It identified solvents from their linguistic context with >99.5% precision.
Takeaway
ChemicalTagger is a computer program that helps scientists find and organize information from chemistry papers, making it easier to understand and use.
Methodology
ChemicalTagger uses a modular architecture combining NLP techniques to tag and parse chemical texts.
Potential Biases
Potential biases in the training data could affect the accuracy of the extraction.
Limitations
The extraction tools may not be perfect and depend on the quality of the input text.
Participant Demographics
The study involved trained chemists with formal backgrounds in different areas of chemistry.
Statistical Information
P-Value
0.5
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website