InChI in the wild: an assessment of InChIKey searching in Google
Abstract While chemical databases can be queried using the InChI string and InChIKey (IK) the latter was designed for open-web searching. It is becoming increasingly effective for this since more sources enhance crawling of their websites by the Googlebot and consequent IK indexing. Searchers who use Google as an adjunct to database access may be […]
UniChem: extension of InChI-based compound mapping to salt, connectivity and stereochemistry layers
Abstract UniChem is a low-maintenance, fast and freely available compound identifier mapping service, recently made available on the Internet. Until now, the criterion of molecular equivalence within UniChem has been on the basis of complete identity between Standard InChIs. However, a limitation of this approach is that stereoisomers, isotopes and salts of otherwise identical molecules […]
On InChI and Evaluating the Quality of Cross-reference Links
Abstract Background: There are many databases of small molecules focused on different aspects of research and its applications. Some tasks may require integration of information from various databases. However, determining which entries from different databases represent the same compound is not straightforward. Integration can be based, for example, on automatically generated cross-reference links between entries. […]
IUPAC STANDARDS ONLINE
Abstract IUPAC Standards Online is a database built from IUPAC’s (The International Union of Pure and Applied Chemistry) standards and recommendations, which are extracted from the journal Pure and Applied Chemistry (PAC). The International Union of Pure and Applied Chemistry (IUPAC) is the organization responsible for setting the standards in chemistry that are internationally binding […]
Current Status and Future Development in Relation to IUPAC Activities
Abstract The IUPAC International Chemical Identifier (InChI) is a non-proprietary, machine-readable chemical structure representation format enabling electronic searching, and interlinking and combining, of chemical information from different sources. It was developed from 2001 onwards at the U.S. National Institute of Standards and Technology under the auspices of IUPAC’s Chemical Identifier project. Since 2009, the InChI […]
Applications of the InChI in cheminformatics with the CDK and Bioclipse
Abstract Background The InChI algorithms are written in C++ and not available as Java library. Integration into software written in Java therefore requires a bridge between C and Java libraries, provided by the Java Native Interface (JNI) technology. Results We here describe how the InChI library is used in the Bioclipse workbench and the Chemistry […]
Application of InChI to curate, index, and query 3-D structures
Abstract The HIV structural database (HIVSDB) is a comprehensive collection of the structures of HIV protease, both of unliganded enzyme and of its inhibitor complexes. It contains abstracts and crystallographic data such as inhibitor and protein coordinates for 248 data sets, of which only 141 are from the Protein Data Bank (PDB). Efficient annotation, indexing, […]
Additive InChI-based optimal descriptors: QSPR modeling of fullerene C60 solubility in organic solvents
Abstract Optimal descriptors calculated with International Chemical Identifier (InChI) have been used to construct one-variable model of the solubility of fullerene C60 in organic solvents . Attempts to calculate the model for three splits into training and test sets gave stable results.
IUPAC InChI (Video)
This presentation is a part of Google Tech Talks which was added to the GoogleTalksArchive on August 22, 2006. The original presentation date took place on November 2, 2006. ABSTRACT (Imported From YouTube Source) The central token of information in Chemistry is a chemical substance, an entity that can often be represented as a well-defined […]
isoenum – a python package to enumerate isotopically resolved InChI
Isotopic (iso) enumerator (enum) – enumerates isotopically resolved InChI (International Chemical Identifier) for metabolites. The isoenum Python package provides command-line interface that allows you to enumerate the possible isotopically-resolved InChI from one of the Chemical Table file (CTfile) formats (i.e. molfile, SDfile) used to describe chemical molecules and reactions as well as from InChI itself. […]
Capturing mixture composition: an open machine-readable format for representing mixed substances
Capturing mixture composition: an open machine-readable format for representing mixed substances Alex M. Clark, Leah R. McEwen, Peter Gedeck & Barry A. Bunin Journal of Cheminformatics volume 11, Article number: 33 (2019) Abstract: We describe a file format that is designed to represent mixtures of compounds in a way that is fully machine readable. This […]
InChILayersExplorer – A Spreadsheet to teach and learn the structure of an InChI
This post consist of a simple spreadsheet that takes that splits an InChI in its layers to facilitate its conceptualisation and its teaching. It considers the six layers currently detailed in the InChI TechnicalFAQ, https://www.inchi-trust.org/wp/technical-faq-2/#4.3. The spreadsheet also facilitates looking up an InChI by entering the molecule name or its SMILES representation.
RDKit InChI Calculation with Jupyter Notebook
This RDKit InChI Calculation with Jupyter Notebook tutorial is useful to teach the basics of how to interact with InChI using a cheminformatics toolkit in a Jupyter Notebook. The notebook has the following learning objectives: Setup RDKit with a Jupyter Notebook Construct a molecule (RDKit molecular object) from a SMILES string Display molecule images Calculate […]
Batch Chemical IDs Conversion in Spreadsheets
Common tools for conversions, including some spreadsheet-based options included in this site, are hard to use for hundred or thousands of compounds we may want to use in cheminformatics projects. This resource includes a diferent approach to the conversion. By using the PubChem Power User Gateway it allows converting hundreds of chemical identifiers on a […]
2012 San Diego ACS presentation: Registration system of mcule: InChI is the key (video)
2012 San Diego ACS presentation: Registration system of mcule: InChI is the key Mcule provides virtual screening services on the web to help identifying novel drug candidates by screening different databases. For these databases, it is essential to have a robust molecule registration system not depending on different drawing conventions, tautomeric states, etc. It […]
InChI Student Worksheet
This document contains a brief intro to InChI suitable for undergraduate students and two exercises, with answer keys. The first assignment asks about the information encoded in a sample InChI. The last question in this assignment asks students to use the InChI Key as a search term – this will be a lot easier to […]
QSAR-modeling of toxicity of organometallic compounds by means of the balance of correlations for InChI-based optimal descriptors
Toropov, A. A., Toropova, A. P., & Benfenati, E. (2010). QSAR-modeling of toxicity of organometallic compounds by means of the balance of correlations for InChI-based optimal descriptors. Molecular diversity, 14(1), 183-192. This paper present a use of InChI-based molecular descriptors to predict toxicity. Its abstract follows. “Quantitative structure–activity relationships (QSAR) for toxicity toward rats (pLD50) have been […]
InChI-based optimal descriptors: QSAR analysis of fullerene[C60]-based HIV-1 PR inhibitors by correlation balance
The International Chemical Identifier (InChI) has been used to construct InChI-based optimal descriptors to model the binding affinity for fullerene[C60]-based inhibitors of human immunodeficiency virus type 1 aspartic protease (HIV-1 PR). Statistical characteristics of the one-variable model obtained by the balance of correlations are as follows: n = 8, r2 = 0.9769, q2LOO = 0.9646, s = 0.099, F = 254 (subtraining set); n = 7, r2 = 0.7616, s = 0.681, F = 16 (calibration set); n = 5, r2 = 0.9724, s = 0.271, F = 106, Rm2 = 0.9495 (test set). Predictability of this approach […]
Use of the international chemical identifier for constructing QSPR-model of normal boiling points of acyclic carbonyl substances
Optimal descriptors calculated with international chemical identifier have been used to construct one-variable model of the normal boiling points of acyclic carbonyl substances. Attempts to calculate the model for three splits into training and test sets gave stable results. Statistical quality of the model is n = 150, r 2 = 0.9825, s = 4.96 °C, F = 8,312 (training set) and n = 50, r 2 = 0.9791, s = 4.68 °C, F = […]
The Chemical Translation Service—a web-based tool to improve standardization of metabolomic report
Summary: Metabolomic publications and databases use different database identifiers or even trivial names which disable queries across databases or between studies. The best way to annotate metabolites is by chemical structures, encoded by the International Chemical Identifier code (InChI) or InChIKey. We have implemented a web-based Chemical Translation Service that performs batch conversions of the most […]
Failures of fractional crystallization: ordered co‐crystals of isomers and near isomers
A list of 270 structures of ordered co‐crystals of isomers, near isomers and molecules that are almost the same has been compiled. Searches for structures containing isomers could be automated by the use of IUPAC International Chemical Identifier (InChI™) strings but searches for co‐crystals of very similar molecules were more labor intensive. Compounds in which […]
Simplified molecular input-line entry system and International Chemical Identifier in the QSAR analysis of styrylquinoline derivatives as HIV-1 integrase inhibitors
The simplified molecular input-line entry system (SMILES) and IUPAC International Chemical Identifier (InChI) were examined as representations of the molecular structure for quantitative structure-activity relationships (QSAR), which can be used to predict the inhibitory activity of styrylquinoline derivatives against the human immunodeficiency virus type 1 (HIV-1). Optimal SMILES-based descriptors give a best model with n […]
Representation of chemical structures
Abstract: At the root of applications for substructure and similarity searching, reaction retrieval, synthesis planning, drug discovery, and physicochemical property prediction is the need for a machine‐readable representation of a structure. Systematic nomenclature is unsuitable, and notations and fragment codes have been superseded, except in certain specific applications. Connection tables are widely used, but there […]
InChI: a user’s perspective
Exchange of chemical structures between practicing chemists is essential to chemical communication. The International Chemical Identifier (InChI) provides a means for lossless communication of structures without resort to any proprietary software or databases nor does it require any payment or royalty fees. This perspective describes why the InChI is valuable to all chemists and how […]
InChI As a Research Data Management Tool
Chemistry International, Volume 38, Issue 3-4, Pages 24–26 Abstract Progress in science has always been driven by data as a primary research output. This is especially true of the data-centric fields of molecular sciences. Scholarly journals in chemistry in the 19th century captured a (probably small) proportion of research data in printed journals, books, and compendia. […]
On InChI and evaluating the quality of cross-reference links
Galgonek and Vondrášek Journal of Cheminformatics 2014, 6:15 Abstract Background: There are many databases of small molecules focused on different aspects of research and its applications. Some tasks may require integration of information from various databases. However, determining which entries from different databases represent the same compound is not straightforward. Integration can be based, for […]