BioCreative II Gene Normalization Overview
Author Information
Author(s): Morgan Alexander A, Lu Zhiyong, Wang Xinglong, Cohen Aaron M, Fluck Juliane, Ruch Patrick, Divoli Anna, Fundel Katrin, Leaman Robert, Hakenberg Jörg, Sun Chengjie, Liu Heng-hui, Torres Rafael, Krauthammer Michael, Lau William W, Liu Hongfang, Hsu Chun-Nan, Schuemie Martijn, Cohen K Bretonnel, Hirschman Lynette
Primary Institution: Stanford University
Hypothesis
The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases.
Conclusion
The study shows that the BioCreative II gene normalization task achieved major advances, with a pooled system performance comparable to human experts.
Supporting Evidence
- Inter-annotator agreement was measured at over 90%.
- Three systems achieved F-measures between 0.80 and 0.81.
- The best composite system achieved an F-measure of 0.92 with 10-fold cross-validation.
- A 'maximum recall' system identified 763 out of 785 identifiers.
- Twenty groups submitted one to three runs each, for a total of 54 runs.
- The study involved a total of 543 abstracts.
- Results show promise as tools to link the literature with biological databases.
- Significant progress was made compared to the first BioCreative challenge.
Takeaway
This study is about a competition where different teams tried to match gene names from scientific papers to a database, and they did really well, almost as good as human experts.
Methodology
The study involved analyzing 281 expert-annotated abstracts for training and a blind test set of 262 documents, measuring inter-annotator agreement and system performance.
Potential Biases
The study may have biases due to the selection of abstracts and the reliance on expert annotations.
Limitations
The task was simplified compared to real curation processes, focusing only on abstracts rather than full texts.
Participant Demographics
Twenty teams participated in the challenge, submitting a total of 54 runs.
Statistical Information
P-Value
0.10
Confidence Interval
90%
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website