Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry
2011

Optimizing Named Entity Recognition for Chemistry Using Workflows

Sample size: 42 publication 10 minutes Evidence: moderate

Author Information

Author(s): Kolluru BalaKrishna, Hawizy Lezan, Murray-Rust Peter, Tsujii Junichi, Ananiadou Sophia

Primary Institution: National Centre for Text Mining, University of Manchester

Hypothesis

Can reconfigurable workflows improve the accuracy of named entity recognition in chemistry?

Conclusion

Using reconfigurable workflows improved the accuracy of named entity recognition in chemistry texts.

Supporting Evidence

  • The workflow-based system increased the F-score from 82.35% to 84.44% on the Sciborg corpus.
  • On the PubMed corpus, the workflow recorded an F-score of 84.84% compared to 84.23% by OSCAR.
  • Eliminating noise from tokenization improved named entity recognition accuracy.

Takeaway

This study shows that by changing how we identify chemical names in texts, we can do a better job at finding them.

Methodology

The study used two reconfigurable workflows to analyze chemical named entity recognition performance on the Sciborg and PubMed corpora.

Potential Biases

Potential bias in the training data due to the manual annotation process.

Limitations

The study's findings may not generalize to all types of chemical texts due to the specific corpora used.

Participant Demographics

The study involved chemical literature from the Sciborg project and PubMed.

Statistical Information

P-Value

0.42

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.1371/journal.pone.0020181

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication