Gene/protein name recognition based on support vector machine using dictionary as features

2005

Gene/Protein Name Recognition Using Support Vector Machine

Sample size: 10000 publication Evidence: moderate

Author Information

Author(s): Mitsumori Tomohiro, Fation Sevrani, Murata Masaki, Doi Kouichi, Doi Hirohumi

Primary Institution: Nara Institute of Science and Technology

Can an automated recognition system based on the SVM algorithm effectively identify gene and protein names in biomedical literature?

The SVM algorithm is robust and does not require feature selection for effective gene/protein name recognition.

The system achieved a balanced f-score of 0.7811 in the BioCreAtIvE competition.
Dictionary matching features contributed to improved performance.
Training data consisted of 7500 sentences and test data of 2500 sentences.

This study created a computer program that helps find names of genes and proteins in scientific papers, making it easier to understand research.

The study used a support vector machine (SVM) algorithm to classify gene/protein names based on various features extracted from training data.

Potential bias may arise from the reliance on specific dictionaries for name recognition.

The study's performance may be affected by the quality of the dictionaries used for matching gene/protein names.

p<0.05

p<0.05

Access the complete publication on the publisher's website