Extracting Information Automatically from Biological Literature
Author Information
Author(s): Christian Blaschke, Robert Hoffmann, Juan Carlos Oliveros, Alfonso Valencia
Primary Institution: Protein Design Group, CNB-CSIC, Madrid, Spain
Conclusion
The study discusses various methods for automatically extracting information from biological literature to aid in understanding genomics and proteomics data.
Supporting Evidence
- The study highlights the need for linking biological databases with literature information.
- Three main types of systems for information extraction are discussed: statistical methods, computational linguistics methods, and frame-based approaches.
- Geisha and Suiseki are two systems evaluated for their effectiveness in extracting biological information.
Takeaway
Scientists are trying to find ways to automatically pull useful information from a lot of biology papers to help understand genes and proteins better.
Methodology
The study reviews statistical methods, computational linguistics methods, and frame-based approaches for extracting information from biological texts.
Potential Biases
There is a risk of bias in the evaluation of systems due to the reliance on known interactions that may not be present in the literature.
Limitations
The adaptation of computational linguistics methods to molecular biology is not guaranteed to be successful.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website