Automated Recognition of Malignancy Mentions in Biomedical Literature
Author Information
Author(s): Jin Yang, Ryan T McDonald, Kevin Lerman, Mark A Mandel, Steven Carroll, Mark Y Liberman, Fernando C Pereira, Raymond S Winters, Peter S White
Primary Institution: University of Pennsylvania
Hypothesis
Can a machine-learning approach effectively identify disease concepts in biomedical literature with minimal manual intervention?
Conclusion
The study demonstrates that high accuracy in identifying malignancy mentions in biomedical texts can be achieved with moderate effort.
Supporting Evidence
- MTag achieved 0.85 precision, 0.83 recall, and 0.84 F-measure on the evaluation set.
- MTag identified 580,002 unique mentions of malignancy from MEDLINE abstracts.
- The extractor performed significantly better than a baseline string matching system.
Takeaway
The researchers created a computer program that can find mentions of cancer in medical texts, making it easier for doctors and researchers to gather information.
Methodology
The study used a machine-learning technique called Conditional Random Fields to develop an entity tagger named MTag, which was trained on a corpus of MEDLINE abstracts.
Potential Biases
Potential misannotations could arise from the complexity of biomedical literature and the diversity of entity types.
Limitations
The study's precision and recall rates suggest that while MTag is effective, it may not be sufficient for tasks requiring high accuracy, such as database population.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website