Identifying Genes and Proteins in Biomedical Text

Sample size: 10000 publication Evidence: moderate

Author Information

Author(s): Finkel Jenny, Dingare Shipra, Manning Christopher D, Nissim Malvina, Alex Beatrice, Grover Claire

Primary Institution: Stanford University

Hypothesis

Can a maximum-entropy based system effectively identify gene and protein names in biomedical abstracts?

Conclusion

The system achieved high precision and recall in identifying gene and protein names, demonstrating the effectiveness of using diverse features and external knowledge sources.

Supporting Evidence

The system achieved a precision of 0.83 and recall of 0.84 in the open evaluation.
In the closed evaluation, the system achieved a precision of 0.78 and recall of 0.85.
The study highlights the importance of using diverse features for effective named entity recognition.

Takeaway

This study created a computer program that helps find names of genes and proteins in medical texts, making it easier to process a lot of information quickly.

Methodology

A maximum-entropy based system was developed to identify gene and protein names using a variety of features and external resources.

Potential Biases

Potential bias due to reliance on external resources and the quality of the training data.

Limitations

The system's performance is limited by the quality of training data and the inherent complexity of biomedical text.

Digital Object Identifier (DOI)

10.1186/1471-2105-6-S1-S5

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication

Home