Automated Recognition of Malignancy Mentions in Biomedical Literature

Sample size: 1442 publication Evidence: moderate

Author Information

Author(s): Jin Yang, Ryan T McDonald, Kevin Lerman, Mark A Mandel, Steven Carroll, Mark Y Liberman, Fernando C Pereira, Raymond S Winters, Peter S White

Primary Institution: University of Pennsylvania

Hypothesis

Can a machine-learning approach effectively identify disease concepts in biomedical literature with minimal manual intervention?

Conclusion

The study demonstrates that high accuracy in identifying malignancy mentions in biomedical texts can be achieved with moderate effort.

Supporting Evidence

MTag achieved 0.85 precision, 0.83 recall, and 0.84 F-measure on the evaluation set.
MTag identified 580,002 unique mentions of malignancy from MEDLINE abstracts.
The extractor performed significantly better than a baseline string matching system.

Takeaway

The researchers created a computer program that can find mentions of cancer in medical texts, making it easier for doctors and researchers to gather information.

Methodology

The study used a machine-learning technique called Conditional Random Fields to develop an entity tagger named MTag, which was trained on a corpus of MEDLINE abstracts.

Potential Biases

Potential misannotations could arise from the complexity of biomedical literature and the diversity of entity types.

Limitations

The study's precision and recall rates suggest that while MTag is effective, it may not be sufficient for tasks requiring high accuracy, such as database population.

Digital Object Identifier (DOI)

10.1186/1471-2105-7-492

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication

Home

Previous Next