The Pfam protein families database: embracing AI/ML
Author Information
Author(s): Paysan-Lafosse Typhaine, Andreeva Antonina, Blum Matthias, Chuguransky Sara Rocio, Grego Tiago, Pinto Beatriz Lazaro, Salazar Gustavo A, Bileschi Maxwell L, Llinares-López Felipe, Meng-Papaxanthos Laetitia, Colwell Lucy J, Grishin Nick V, Schaeffer R Dustin, Clementel Damiano, Tosatto Silvio C E, Sonnhammer Erik, Wood Valerie, Bateman Alex
Primary Institution: European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI)
Conclusion
The Pfam database has integrated AI and machine learning to enhance protein family classification and coverage.
Supporting Evidence
- Pfam has integrated with InterPro to provide a single platform for protein family information.
- AI techniques have led to significant increases in protein family coverage.
- New families have been discovered through large-scale sequence similarity analysis.
Takeaway
Pfam is a database that helps scientists understand proteins better, and now it's using smart computer programs to find even more about them.
Methodology
The study involved updating the Pfam database with new protein families and using AI to improve classification.
Limitations
Many protein families remain unclassified, and the transition to the InterPro website may cause confusion among users.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website