InChI Tag: Audience

92 posts

UniChem: extension of InChI-based compound mapping to salt, connectivity and stereochemistry layers

Chambers et al. Journal of Cheminformatics 2014, 6:43


UniChem is a low-maintenance, fast and freely available compound identifier mapping service, recently made available on the Internet. Until now, the criterion of molecular equivalence within UniChem has been on the basis of complete identity between Standard InChIs. However, a limitation of this approach is that stereoisomers, isotopes and salts of otherwise identical molecules are not considered as related. Here, we describe how we have exploited the layered structural representation of the Standard InChI to create new functionality within UniChem that integrates these related molecular forms. The service, called ‘Connectivity Search’ allows molecules to be first matched on the basis of complete identity between the connectivity layer of their corresponding Standard InChIs, and the remaining layers then compared to highlight stereochemical and isotopic differences. Parsing of Standard InChI sub-layers permits mixtures and salts to also be included in this integration process. Implementation of these enhancements required simple modifications to the schema, loader and web application, but none of which have changed the original UniChem functionality or services. The scope of queries may be varied using a variety of easily configurable options, and the output is annotated to assist the user to filter, sort and understand the difference between query and retrieved structures. A RESTful web service output may be easily processed programmatically to allow developers to present the data in whatever form they believe their users will require, or to define their own level of molecular equivalence for their resource, albeit within the constraint of identical connectivity.

Keywords: UniChem, Standard InChI, InChIKey, Chemical databases, Data integration, Connectivity search

Data Formats for Elementary Gas Phase Kinetics, Part 1: Unique Representations of Species at the Molecular Level

BURGESS, D. R., MANION, J. A. and HAYES, C. J. (2014), Data Formats for Elementary Gas Phase Kinetics, Part 1: Unique Representations of Species at the Molecular Level. Int. J. Chem. Kinet., 46: 640-650. doi:10.1002/kin.20875


Standardized electronic formats for data are needed to efficiently and transparently communicate the results of scientific studies. A format for the unique identification of chemical species is a requirement in the field of chemistry, and the IUPAC International Chemical Identifier (InChI) has been widely adopted for this purpose. The InChI identifier has proved to be very useful. The InChI identifier, however, is currently insufficient to uniquely specify some types of molecular entities at a detailed molecular level needed to fully characterize their chemical nature, to differentiate between chemically distinct conformers, to uniquely identify structures used in quantum chemical calculations, and to completely describe elementary chemical reactions. To address this limitation, we propose an augmented form of InChI, denoted as InChI–ER, which contains additional optional layers that allow the unique and unambiguous identification of molecules at a detailed molecular level. The new layers proposed herein are optional extensions of the existing InChI formalism and, like all other InChI layers, would not interfere with InChI identifiers currently in use. The focus of the present work is the better specification of required molecular entities such as rotational conformations, ring conformations, and electronic states. In companion articles, we propose additional reaction layers using an extended InChI format that will enable the unique identification of elementary chemical reactions, including specification of associated transition states, specification of the changes in bonds that occur during reaction, and classification of reaction types.

Applications of the InChI in cheminformatics with the CDK and Bioclipse

Spjuth et al. Journal of Cheminformatics 2013, 5:14


Background: The InChI algorithms are written in C++ and not available as Java library. Integration into software written in Java therefore requires a bridge between C and Java libraries, provided by the Java Native Interface (JNI) technology. Results: We here describe how the InChI library is used in the Bioclipse workbench and the Chemistry Development Kit (CDK) cheminformatics library. To make this possible, a JNI bridge to the InChI library was developed, JNI-InChI, allowing Java software to access the InChI algorithms. By using this bridge, the CDK project packages the InChI binaries in a module and offers easy access from Java using the CDK API. The Bioclipse project packages and offers InChI as a dynamic OSGi bundle that can easily be used by any OSGi-compliant software, in addition to the regular Java Archive and Maven bundles. Bioclipse itself uses the InChI as a key component and calculates it on the fly when visualizing and editing chemical structures. We demonstrate the utility of InChI with various applications in CDK and Bioclipse, such as decision support for chemical liability assessment, tautomer generation, and for knowledge aggregation using a linked data approach. Conclusions: These results show that the InChI library can be used in a variety of Java library dependency solutions, making the functionality easily accessible by Java software, such as in the CDK. The applications show various ways the InChI has been used in Bioclipse, to enrich its functionality.

Keywords: InChI, InChIKey, Chemical structures, JNI-InChI, The Chemistry Development Kit, OSGi, Bioclipse, Decision support, Linked data, Tautomers, Databases, Semantic web

CVDHD: a cardiovascular disease herbal database for drug discovery and network pharmacology

Gu et al. Journal of Cheminformatics 2013, 5:51

Background: Cardiovascular disease (CVD) is the leading cause of death and associates with multiple risk factors.
Herb medicines have been used to treat CVD long ago in china and several natural products or derivatives (e.g.,
aspirin and reserpine) are most common drugs all over the world. The objective of this work was to construct a
systematic database for drug discovery based on natural products separated from CVD-related medicinal herbs and
to research on action mechanism of herb medicines.

Description: The cardiovascular disease herbal database (CVDHD) was designed to be a comprehensive resource for
virtual screening and drug discovery from natural products isolated from medicinal herbs for cardiovascular-related
diseases. CVDHD comprises 35230 distinct molecules and their identification information (chemical name, CAS registry
number, molecular formula, molecular weight, international chemical identifier (InChI) and SMILES), calculated molecular
properties (AlogP, number of hydrogen bond acceptor and donors, etc.), docking results between all molecules and
2395 target proteins, cardiovascular-related diseases, pathways and clinical biomarkers. All 3D structures were optimized
in the MMFF94 force field and can be freely accessed.

Conclusions: CVDHD integrated medicinal herbs, natural products, CVD-related target proteins, docking results, diseases
and clinical biomarkers. By using the methods of virtual screening and network pharmacology, CVDHD will provide a
platform to streamline drug/lead discovery from natural products and explore the action mechanism of medicinal herbs.
CVDHD is freely available at

Keywords: Cardiovascular disease, Drug discovery, Network pharmacology, Molecular docking, Virtual screening, Herbal
formula, Natural products, Medicinal herbs, Traditional Chinese medicine

Matlab InChIKey Scripts

This is a collection of Matlab scripts for working with InChIKeys: IKextract, IKfreqFH, IKstring, and IKmusic

IKextract, InChIKey Extract, can extract InChIKeys from chemical Structure data files (SDFs). This script was successfully used to extract over 90 million InChIKeys (unique chemical identifiers) from over 5000 PubChem SD files. Users can also extract other data from SDFs by specifying the desired SD tag.

IKfreqFH, InChIKey frequency of first hash block, extracts the first hash block of InChIKeys and sorts them by frequency. Such a method is useful for analyzing the variety of chemical connectivity in large datasets.

IKstring, InChIKey String, allows for searching for strings within InChIKeys. I use it to search the > 90 million InChIKeys in PubChem.

IKmusic, InChIKey music, creates music from InChIKeys. A unique song is created for each InChIKey (i.e. every unique chemical substance has a different song!)

Identifier conversion on an Excel spreadsheet

This resource is a simple spreadsheet in Excel that provides a handy interconversion between different chemical identifiers, namely name, InChI, InChIKey and SMILES. It uses some web services to do translations, i.e. PubChem PUG REST and NCI/CADD Chemical Identifier Resolver.

This workbook does not uses macros but makes use of the WEBSERVICE function added to Excel for Windows in Excel 2013.


Many InChIs and quite some feat

Comprehensive 2015 article published in Springer’s Journal of Computer-Aided Molecular Design. Here is the abstract,

The IUPAC International Chemical Identifier (InChI) is a non-proprietary, international standard to represent chemical structures. It was conceived 15 years ago, and has been is use for 10 years. The InChI Trust is developing and improving on the current standard, further enabling the interlinking of chemical structures on the web. This mini-review looks at the widespread adoption of InChI in software and databases.

Breu introducció a la digitalització de la informació química

This is an article in Catalan that provides an introduction to chemical information and describes InChI along with other chemical identifiers. Its abstract reads:

“Chemical information, once managed in books paradigmatically in Chemical Abstracts and several handbooks, has now migrated to Internet. Nowadays many large databases, both commercial and freely available, have much more information than we have ever had. But accessing them requires some skills that are not yet taught in the official chemistry degrees. This paper presents a brief introduction to the notations and codes that are currently used to identify the chemical species in computer environments. At the same time, some freely available chemistry databases are presented.”


IUPAC Name2PubChem

This submission shows you how to create a smart spreadsheet with Google Sheets that links an IUPAC name to a chemical’s PubChem landing page. You may click here to get a copy of this sheet.  This particular sheet uses the Centre for Molecular Informatics OPSIN (Open Parser for Systematic IUPAC nomenclature) web service to convert the name to an InChI key, which is then appended to a hyperlink to PubChem.   You will note that some of the names do not work and this is because those names in the sample sheet are incorrect names.  If you paste those names directly into the OPSIN web service, it will tell you were an error in parsing the name occurred.

The following video shows you how to create this  Google Sheet and below it is the instructions and code needed. This application takes advantage of the canonical nature of the InChI and its key, and the fact that the key allows you to communicate over the web.


Step 1: Paste your IUPAC names into a column of your spread sheet

Step 2: Convert IUPAC name to Standard InChI key
type the following script into the top cell of the column you want to place your keys into, and hit enter”


  • the ampersand(&)concatenates the cell content to the URL
  • the ampersand must be surrounded by quotation marks
  • the URL must be in quotation marks

Click on the black box in the bottom right corner of cell and drag down, converting the entire column of names to keys.

Step 3: Hyperlink the key to PubChem
Type the following script into the top cell of the column you want to place your links into, and hit enter”n


  • the ampersand (&) concatenates the cell content to the URL
  • the ampersand must be surrounded by quotation marks
  • the URL must be in quotation marks

NOTE, these are dynamic cells – And will be recalculated everytime you open the page, or change the chemical name.  If you want them to be static, you can copy the block of cells, and paste to another location as text.

You can also download the sheet as an Excel Spreadsheet, but the downloaded sheet will not be dynamic.  It will be linked, but will not change if you change the IUPAC name.