Identifying and Fixing Protein Errors in Databases
Author Information
Author(s): Nagy Alinda, Hegyi Hédi, Farkas Krisztina, Tordai Hedvig, Kozma Evelin, Bányai László, Patthy László
Primary Institution: Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences
Hypothesis
Can the MisPred approach effectively identify and correct mispredicted proteins in public databases?
Conclusion
The MisPred approach efficiently identifies errors in protein predictions and can significantly improve the quality of gene predictions and associated databases.
Supporting Evidence
- The MisPred approach identified errors in the Swiss-Prot section of UniProtKB, a high-quality protein database.
- The majority of errors detected were due to conflicts with established dogmas about protein structure.
- MisPred was able to correct many identified errors by targeted searches of genomic and EST databases.
Takeaway
This study shows a way to find and fix mistakes in protein data, helping scientists get better information from databases.
Methodology
The MisPred approach uses five routines to identify abnormal, incomplete, or mispredicted protein entries based on known features of protein-coding genes.
Limitations
The study may not capture all errors due to the limitations of the bioinformatic tools and the dogmas on which the MisPred routines are based.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website