Daniela Dolciami, Eloy Villasclaras-Fernandez, Christos Kannas, Mirco Meniconi, Bissan Al-Lazikani, Albert A. Antolin
J Cheminform 14, 28 (2022).
We use canSARchem to standardize all the compounds uploaded in canSAR (> 3 million) enabling efficient data integration and the rapid identification of alternative compound forms with useful bioactivity data. Comparison with PubChem and ChEMBL pipelines evidenced comparable performances in compound standardization, but only PubChem and canSAR canonicalize tautomers and canSAR has a slightly lower rejection rate. Our results highlight the importance of compound hierarchies for bioactivity data exploration. We make canSARchem available under a Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0) at https://gitlab.icr.ac.uk/cansar-public/compound-registration-pipeline.