Gene/Protein Name Recognition Using Support Vector Machine
Author Information
Author(s): Mitsumori Tomohiro, Fation Sevrani, Murata Masaki, Doi Kouichi, Doi Hirohumi
Primary Institution: Nara Institute of Science and Technology
Hypothesis
Can an automated recognition system based on the SVM algorithm effectively identify gene and protein names in biomedical literature?
Conclusion
The SVM algorithm is robust and does not require feature selection for effective gene/protein name recognition.
Supporting Evidence
- The system achieved a balanced f-score of 0.7811 in the BioCreAtIvE competition.
- Dictionary matching features contributed to improved performance.
- Training data consisted of 7500 sentences and test data of 2500 sentences.
Takeaway
This study created a computer program that helps find names of genes and proteins in scientific papers, making it easier to understand research.
Methodology
The study used a support vector machine (SVM) algorithm to classify gene/protein names based on various features extracted from training data.
Potential Biases
Potential bias may arise from the reliance on specific dictionaries for name recognition.
Limitations
The study's performance may be affected by the quality of the dictionaries used for matching gene/protein names.
Statistical Information
P-Value
p<0.05
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website