Recognizing Protein and Gene Names from Text

Sample size: 10000 publication Evidence: high

Author Information

Author(s): Zhou GuoDong, Shen Dan, Zhang Jie, Su Jian, Tan SoonHeng

Primary Institution: Institute for Infocomm Research

Hypothesis

Can an ensemble of classifiers improve the recognition of protein and gene names in biomedical texts?

Conclusion

The proposed system achieved the best performance among competitors with an F-measure of 82.58 in recognizing protein and gene names.

Supporting Evidence

The system outperformed 10 other systems in the BioCreative competition.
It achieved a balanced F-measure of 82.58.
The ensemble approach combined different classifiers to improve recognition accuracy.

Takeaway

This study created a smart system that helps computers understand names of proteins and genes in scientific texts, making it easier for scientists to find important information.

Methodology

An ensemble of classifiers including SVM and DHMMs was used, combined with post-processing modules for improved performance.

Limitations

The system's performance may be affected by the ambiguity in public resources and the complexity of biomedical names.

Digital Object Identifier (DOI)

10.1186/1471-2105-6-S1-S7

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication

Home