Identification and correction of abnormal, incomplete and mispredicted proteins in public databases
2008

Identifying and Fixing Protein Errors in Databases

publication Evidence: high

Author Information

Author(s): Nagy Alinda, Hegyi Hédi, Farkas Krisztina, Tordai Hedvig, Kozma Evelin, Bányai László, Patthy László

Primary Institution: Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences

Hypothesis

Can the MisPred approach effectively identify and correct mispredicted proteins in public databases?

Conclusion

The MisPred approach efficiently identifies errors in protein predictions and can significantly improve the quality of gene predictions and associated databases.

Supporting Evidence

  • The MisPred approach identified errors in the Swiss-Prot section of UniProtKB, a high-quality protein database.
  • The majority of errors detected were due to conflicts with established dogmas about protein structure.
  • MisPred was able to correct many identified errors by targeted searches of genomic and EST databases.

Takeaway

This study shows a way to find and fix mistakes in protein data, helping scientists get better information from databases.

Methodology

The MisPred approach uses five routines to identify abnormal, incomplete, or mispredicted protein entries based on known features of protein-coding genes.

Limitations

The study may not capture all errors due to the limitations of the bioinformatic tools and the dogmas on which the MisPred routines are based.

Digital Object Identifier (DOI)

10.1186/1471-2105-9-353

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication