Identifying Genes and Proteins in Biomedical Text
Author Information
Author(s): Finkel Jenny, Dingare Shipra, Manning Christopher D, Nissim Malvina, Alex Beatrice, Grover Claire
Primary Institution: Stanford University
Hypothesis
Can a maximum-entropy based system effectively identify gene and protein names in biomedical abstracts?
Conclusion
The system achieved high precision and recall in identifying gene and protein names, demonstrating the effectiveness of using diverse features and external knowledge sources.
Supporting Evidence
- The system achieved a precision of 0.83 and recall of 0.84 in the open evaluation.
- In the closed evaluation, the system achieved a precision of 0.78 and recall of 0.85.
- The study highlights the importance of using diverse features for effective named entity recognition.
Takeaway
This study created a computer program that helps find names of genes and proteins in medical texts, making it easier to process a lot of information quickly.
Methodology
A maximum-entropy based system was developed to identify gene and protein names using a variety of features and external resources.
Potential Biases
Potential bias due to reliance on external resources and the quality of the training data.
Limitations
The system's performance is limited by the quality of training data and the inherent complexity of biomedical text.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website