canSAR 2024—an update to the public drug discovery knowledgebase

Sample size: 12561 publication Evidence: high

Author Information

Author(s): Gingrich Phillip W, Chitsazi Rezvan, Biswas Ansuman, Jiang Chunjie, Zhao Li, Tym Joseph E, Brammer Kevin M, Li Jun, Shu Zhigang, Maxwell David S, Tacy Jeffrey A, Mica Ioan L, Darkoh Michael, di Micco Patrizio, Russell Kaitlyn P, Workman Paul, Al-Lazikani Bissan

Primary Institution: University of Texas MD Anderson Cancer Center

Hypothesis

How can integrating diverse data sources improve cancer drug discovery?

Conclusion

The latest updates to canSAR enhance its capabilities for cancer drug discovery by integrating more data and improving algorithms.

Supporting Evidence

canSAR integrates data from over 25 sources to enhance cancer drug discovery.
Over 4.5 million compounds and 13.3 million bioactivities are included in canSAR.
canSAR has identified nearly 600,000 ligandable pockets across protein chains.

Takeaway

canSAR is like a big library that helps scientists find new medicines for cancer by bringing together lots of different information.

Methodology

canSAR integrates data from over 25 sources, including genomic, chemical, and clinical data, and uses machine learning algorithms for analysis.

Potential Biases

The previous method of labeling pockets as undruggable may have biased the model against finding novel druggable sites.

Limitations

The majority of protein pockets are still considered undruggable, and the precision of predictions cannot be quantified due to the nature of the data.

Participant Demographics

Data includes 12,561 tumor samples from 12,520 patients and 19,408 RNAseq samples from 10,955 patients.

Digital Object Identifier (DOI)

10.1093/nar/gkae1050

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication

Home