InChI Tag: Extended Tautomers

10 posts

NCI CADD Tautomerizer

Tautomerizer – Predict tautomers based on 80+ rules

https://cactus.nci.nih.gov/tautomerizer/

Introduction from Web Service (11/24/2022):

Experimental service that allows you to test a set of tautomeric transforms with your own molecules. The predefined set of transforms comprises both the current 24 standard rules used by the chemoinformatics toolkit CACTVS and 55+ additional rules compiled in the context of the IUPAC project of Redesign of the Handling of Tautomerism in InChI V2.

Please be aware that this is a chemoinformatics tool, i.e. the tautomer generation process is strictly pattern-based and does not take energetics into account in any way. Some of the generated tautomers may therefore be of high energy and not detectable experimentally.

Enumeration of Ring–Chain Tautomers Based on SMIRKS Rules

Enumeration of Ring–Chain Tautomers Based on SMIRKS Rules

Laura Guasch, Markus Sitzmann, and Marc C. Nicklaus
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4170818/
J Chem Inf Model. 2014 Sep 22; 54(9): 2423–2432.

Abstract: A compound exhibits (prototropic) tautomerism if it can be represented by two or more structures that are related by a formal intramolecular movement of a hydrogen atom from one heavy atom position to another. When the movement of the proton is accompanied by the opening or closing of a ring it is called ring–chain tautomerism. This type of tautomerism is well observed in carbohydrates, but it also occurs in other molecules such as warfarin. In this work, we present an approach that allows for the generation of all ring–chain tautomers of a given chemical structure. Based on Baldwin’s Rules estimating the likelihood of ring closure reactions to occur, we have defined a set of transform rules covering the majority of ring–chain tautomerism cases. The rules automatically detect substructures in a given compound that can undergo a ring–chain tautomeric transformation. Each transformation is encoded in SMIRKS line notation. All work was implemented in the chemoinformatics toolkit CACTVS. We report on the application of our ring–chain tautomerism rules to a large database of commercially available screening samples in order to identify ring–chain tautomers

Tautomerism of Warfarin: Combined Chemoinformatics, Quantum Chemical, and NMR Investigation

Tautomerism of Warfarin: Combined Chemoinformatics, Quantum Chemical, and NMR Investigation

Laura Guasch, Megan L. Peach, and Marc C. Nicklaus
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7724503/
J Org Chem. 2015 Oct 16; 80(20): 9900–9909.

Abstract: Warfarin, an important anticoagulant drug, can exist in solution in 40 distinct tautomeric forms through both prototropic tautomerism and ring–chain tautomerism. We have investigated all warfarin tautomers with computational and NMR approaches. Relative energies calculated at the B3LYP/6-311G+ +(d,p) level of theory indicate that the 4-hydroxycoumarin cyclic hemiketal tautomer is the most stable tautomer in aqueous solution, followed by the 4-hydroxycoumarin open-chain tautomer. This is in agreement with our NMR experiments where the spectral assignments indicate that warfarin exists mainly as a mixture of cyclic hemiketal diastereomers, with an open-chain tautomer as a minor component. We present a diagram of the interconversion of warfarin created taking into account the calculated equilibrium constants (pKT) for all tautomeric reactions. These findings help with gaining further understanding of proton transfer and ring closure tautomerization processes. We also discuss the results in the context of chemoinformatics rules for handling tautomerism.

Experimental and Chemoinformatics Study of Tautomerism in a Database of Commercially Available Screening Samples

Experimental and Chemoinformatics Study of Tautomerism in a Database of Commercially Available Screening Samples

Laura Guasch, Waruna Yapamudiyansel, Megan L. Peach, James A. Kelley, Joseph J. Barchi, Jr., and Marc C. Nicklaus
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5129033/
 2016 Nov 28; 56(11): 2149–2161.

Abstract: We investigated how many cases of the same chemical sold as different products (at possibly different prices) occurred in a prototypical large aggregated database and simultaneously tested the tautomerism definitions in the chemoinformatics toolkit CACTVS. We applied the standard CACTVS tautomeric transforms plus a set of recently developed ring–chain transforms to the Aldrich Market Select (AMS) database of 6 million screening samples and building blocks. In 30 000 cases, two or more AMS products were found to be just different tautomeric forms of the same compound. We purchased and analyzed 166 such tautomer pairs and triplets by 1H and 13C NMR to determine whether the CACTVS transforms accurately predicted what is the same “stuff in the bottle”. Essentially all prototropic transforms with examples in the AMS were confirmed. Some of the ring–chain transforms were found to be too “aggressive”, i.e. to equate structures with one another that were different compounds

Tautomerism in large databases

Tautomerism in large databases

Markus Sitzmann, Wolf-Dietrich Ihlenfeldt, and Marc C. Nicklaus
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2886898/
Journal of Computer-Aided Molecular Design volume 24pages521–551 (2010)
https://link.springer.com/article/10.1007/s10822-010-9346-4#Ack1

 

Abstract: We have used the Chemical Structure DataBase (CSDB) of the NCI CADD Group, an aggregated collection of over 150 small-molecule databases totaling 103.5 million structure records, to conduct tautomerism analyses on one of the largest currently existing sets of real (i.e. not computer-generated) compounds. This analysis was carried out using calculable chemical structure identifiers developed by the NCI CADD Group, based on hash codes available in the chemoinformatics toolkit CACTVS and a newly developed scoring scheme to define a canonical tautomer for any encountered structure. CACTVS’s tautomerism definition, a set of 21 transform rules expressed in SMIRKS line notation, was used, which takes a comprehensive stance as to the possible types of tautomeric interconversion included. Tautomerism was found to be possible for more than 2/3 of the unique structures in the CSDB. A total of 680 million tautomers were calculated from, and including, the original structure records. Tautomerism overlap within the same individual database (i.e. at least one other entry was present that was really only a different tautomeric representation of the same compound) was found at an average rate of 0.3% of the original structure records, with values as high as nearly 2% for some of the databases in CSDB. Projected onto the set of unique structures (by FICuS identifier), this still occurred in about 1.5% of the cases. Tautomeric overlap across all constituent databases in CSDB was found for nearly 10% of the records in the collection.

Tautomer Database: A Comprehensive Resource for Tautomerism Analyses

Tautomer Database: A Comprehensive Resource for Tautomerism Analyses

Devendra K. Dhaked, Laura Guasch, and Marc C. Nicklaus
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8456363/

 

Abstract: We report a database of tautomeric structures that contains 2819 tautomeric tuples extracted from 171 publications. Each tautomeric entry has been annotated with experimental conditions reported in the respective publication, plus bibliographic details, structural identifiers (e.g., NCI/CADD identifiers FICTS, FICuS, uuuuu, and Standard InChI), and chemical information (e.g., SMILES, molecular weight). The majority of tautomeric tuples found were pairs; the remaining 10% were triples, quadruples, or quintuples, amounting to a total number of structures of 5977. The types of tautomerism were mainly prototropic tautomerism (79%), followed by ring–chain (13%) and valence tautomerism (8%). The experimental conditions reported in the publications included about 50 pure solvents and 9 solvent mixtures with 26 unique spectroscopic or nonspectroscopic methods. 1H and 13C NMR were the most frequently used methods. A total of 77 different tautomeric transform rules (SMIRKS) are covered by at least one example tuple in the database. This database is freely available as a spreadsheet at https://cactus.nci.nih.gov/download/tautomer/.

Toward a Comprehensive Treatment of Tautomerism in Chemoinformatics Including in InChI V2

Toward a Comprehensive Treatment of Tautomerism in Chemoinformatics Including in InChI V2

Devendra K. Dhaked, Wolf-Dietrich Ihlenfeldt, Hitesh Patel, Victorien Delannée, and Marc C. Nicklaus*

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8459712/

Abstract: We have collected 86 different transforms of tautomeric interconversions. Out of those, 54 are for prototropic (non-ring–chain) tautomerism, 21 for ring–chain tautomerism, and 11 for valence tautomerism. The majority of these rules have been extracted from experimental literature. Twenty rules, covering the most well-known types of tautomerism such as keto–enol tautomerism, were taken from the default handling of tautomerism by the chemoinformatics toolkit CACTVS. The rules were analyzed against nine differerent databases totaling over 400 million (non-unique) structures as to their occurrence rates, mutual overlap in coverage, and recapitulation of the rules’ enumerated tautomer sets by InChI V.1.05, both in InChI’s Standard and a Nonstandard version with the increased tautomer-handling options 15T and KET turned on. These results and the background of this study are discussed in the context of the IUPAC InChI Project tasked with the redesign of handling of tautomerism for an InChI version 2. Applying the rules presented in this paper would approximately triple the number of compounds in typical small-molecule databases that would be affected by tautomeric interconversion by InChI V2. A web tool has been created to test these rules at https://cactus.nci.nih.gov/tautomerizer.

Crowdsourced Evaluation of InChI-based Tautomer Identification

Crowdsourced Evaluation of InChI-based Tautomer Identification
precisionFDA Challenge

https://precision.fda.gov/challenges/29

Challenge Time Period

November 1, 2022 – March 1, 2023

This challenge focuses on the International Chemical Identifier (InChI), which was developed and is maintained under the auspices of the International Union of Pure and Applied Chemistry (IUPAC) and the InChI Trust. The InChI Trust, the IUPAC Working Group on Tautomers, and the U.S. Food and Drug Administration (FDA) call on the scientific community dealing with chemical repositories/data sets and analytics of compounds to test the recently modified InChI algorithm, which was designed for advanced recognition of tautomers. Participants will evaluate this algorithm against real chemical samples in this Crowdsourced Evaluation of InChI-based Tautomer Identification.

Note: You can download a PDF of the Fall 2022 ACS Presentation