Improving Protein Name Recognition with NE Dictionaries
Author Information
Author(s): Yutaka Sasaki, Yoshimasa Tsuruoka, John McNaught, Sophia Ananiadou
Primary Institution: University of Manchester
Hypothesis
Can adding named entities to a dictionary improve the performance of statistical named entity recognition without retraining?
Conclusion
The study shows that adding known named entities to a dictionary can significantly enhance protein name recognition performance.
Supporting Evidence
- The F-score improved from 73.14 to 73.78 after adding protein names to the dictionary.
- Further enrichment with test set protein names increased performance to an F-score of 78.72.
- The approach allows users to enhance performance without retraining the model.
Takeaway
This study found a way to make recognizing protein names easier by using a special dictionary, which helps computers understand them better without needing to start over.
Methodology
A hybrid approach combining dictionary-based and statistical methods was used to improve named entity recognition without retraining the model.
Potential Biases
Potential bias due to reliance on a curated dictionary that may not cover all protein names.
Limitations
The study's performance may be affected by the inherent ambiguity and variability of protein names.
Participant Demographics
The training data consisted of 2,000 abstracts manually annotated by a biologist.
Statistical Information
P-Value
0.0001
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website