InChI Posts

HOME
InChI Posts
Classroom Material
Cheminformatics
Tautomerism in large databases

24 November 2022 / Last updated : 24 November 2022 InChI OER Admin Cheminformatics

Tautomerism in large databases

Markus Sitzmann, Wolf-Dietrich Ihlenfeldt, and Marc C. Nicklaus
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2886898/
Journal of Computer-Aided Molecular Design volume 24, pages521–551 (2010)
https://link.springer.com/article/10.1007/s10822-010-9346-4#Ack1

Abstract: We have used the Chemical Structure DataBase (CSDB) of the NCI CADD Group, an aggregated collection of over 150 small-molecule databases totaling 103.5 million structure records, to conduct tautomerism analyses on one of the largest currently existing sets of real (i.e. not computer-generated) compounds. This analysis was carried out using calculable chemical structure identifiers developed by the NCI CADD Group, based on hash codes available in the chemoinformatics toolkit CACTVS and a newly developed scoring scheme to define a canonical tautomer for any encountered structure. CACTVS’s tautomerism definition, a set of 21 transform rules expressed in SMIRKS line notation, was used, which takes a comprehensive stance as to the possible types of tautomeric interconversion included. Tautomerism was found to be possible for more than 2/3 of the unique structures in the CSDB. A total of 680 million tautomers were calculated from, and including, the original structure records. Tautomerism overlap within the same individual database (i.e. at least one other entry was present that was really only a different tautomeric representation of the same compound) was found at an average rate of 0.3% of the original structure records, with values as high as nearly 2% for some of the databases in CSDB. Projected onto the set of unique structures (by FICuS identifier), this still occurred in about 1.5% of the cases. Tautomeric overlap across all constituent databases in CSDB was found for nearly 10% of the records in the collection.

Information
Content Type	OER
Author(s)	Markus Sitzmann, Wolf-Dietrich Ihlenfeldt, and Marc C. Nicklaus
DOI	10.1007/s10822-010-9346-4
Content Link	https://link.springer.com/article/10.1007/s10822-010-9346-4#Ack1
License	CC by NC/2.0
Content Status	publish
Date Published
Content Tags	Cheminformatics, Classroom Material, Content type, Extended Tautomers, Organic, Publication

InChI Tags: Cheminformatics, Classroom Material, Content type, Extended Tautomers, Organic and Publication

Cheminformatics

24 November 2022

Cheminformatics

24 November 2022

Tautomerism in large databases

Tautomerism in large databases

Tautomer Database: A Comprehensive Resource for Tautomerism Analyses

Experimental and Chemoinformatics Study of Tautomerism in a Database of Commercially Available Screening Samples