Gene/protein name recognition based on support vector machine using dictionary as features
2005

Gene/Protein Name Recognition Using Support Vector Machine

Sample size: 10000 publication Evidence: moderate

Author Information

Author(s): Mitsumori Tomohiro, Fation Sevrani, Murata Masaki, Doi Kouichi, Doi Hirohumi

Primary Institution: Nara Institute of Science and Technology

Hypothesis

Can an automated recognition system based on the SVM algorithm effectively identify gene and protein names in biomedical literature?

Conclusion

The SVM algorithm is robust and does not require feature selection for effective gene/protein name recognition.

Supporting Evidence

  • The system achieved a balanced f-score of 0.7811 in the BioCreAtIvE competition.
  • Dictionary matching features contributed to improved performance.
  • Training data consisted of 7500 sentences and test data of 2500 sentences.

Takeaway

This study created a computer program that helps find names of genes and proteins in scientific papers, making it easier to understand research.

Methodology

The study used a support vector machine (SVM) algorithm to classify gene/protein names based on various features extracted from training data.

Potential Biases

Potential bias may arise from the reliance on specific dictionaries for name recognition.

Limitations

The study's performance may be affected by the quality of the dictionaries used for matching gene/protein names.

Statistical Information

P-Value

p<0.05

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.1186/1471-2105-6-S1-S8

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication