Optimizing Named Entity Recognition for Chemistry Using Workflows
Author Information
Author(s): Kolluru BalaKrishna, Hawizy Lezan, Murray-Rust Peter, Tsujii Junichi, Ananiadou Sophia
Primary Institution: National Centre for Text Mining, University of Manchester
Hypothesis
Can reconfigurable workflows improve the accuracy of named entity recognition in chemistry?
Conclusion
Using reconfigurable workflows improved the accuracy of named entity recognition in chemistry texts.
Supporting Evidence
- The workflow-based system increased the F-score from 82.35% to 84.44% on the Sciborg corpus.
- On the PubMed corpus, the workflow recorded an F-score of 84.84% compared to 84.23% by OSCAR.
- Eliminating noise from tokenization improved named entity recognition accuracy.
Takeaway
This study shows that by changing how we identify chemical names in texts, we can do a better job at finding them.
Methodology
The study used two reconfigurable workflows to analyze chemical named entity recognition performance on the Sciborg and PubMed corpora.
Potential Biases
Potential bias in the training data due to the manual annotation process.
Limitations
The study's findings may not generalize to all types of chemical texts due to the specific corpora used.
Participant Demographics
The study involved chemical literature from the Sciborg project and PubMed.
Statistical Information
P-Value
0.42
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website