InChI Tag: biochemistry

7 posts

Comparative evaluation of open source software for mapping between metabolite identifiers in metabolic network reconstructions: application to Recon 2



An important step in the reconstruction of a metabolic network is annotation of metabolites. Metabolites are generally annotated with various database or structure based identifiers. Metabolite annotations in metabolic reconstructions may be incorrect or incomplete and thus need to be updated prior to their use.
Genome-scale metabolic reconstructions generally include hundreds of metabolites. Manually updating annotations is therefore highly laborious. This prompted us to look for open-source software applications that could facilitate automatic updating of annotations by mapping between available metabolite identifiers. We identified three applications developed for the metabolomics and chemical informatics communities as potential solutions. The applications were MetMask, the Chemical Translation System, and UniChem. The first implements a “metabolite masking” strategy for mapping between identifiers whereas the latter two implement different versions of an InChI based strategy. Here we evaluated the suitability of these applications for the task of mapping between metabolite identifiers in genome-scale metabolic reconstructions. We applied the best suited application to updating identifiers in Recon 2, the latest reconstruction of human metabolism.


All three applications enabled partially automatic updating of metabolite identifiers, but significant manual effort was still required to fully update identifiers. We were able to reduce this manual effort by searching for new identifiers using multiple types of information about metabolites. When multiple types of information were combined, the Chemical Translation System enabled us to update over 3,500 metabolite identifiers in Recon 2. All but approximately 200 identifiers were updated automatically.


We found that an InChI based application such as the Chemical Translation System was better suited to the task of mapping between metabolite identifiers in genome-scale metabolic reconstructions. We identified several features, however, that could be added to such an application in order to tailor it to this task.

On InChI and Evaluating the Quality of Cross-reference Links


Background: There are many databases of small molecules focused on different aspects of research and its applications. Some tasks may require integration of information from various databases. However, determining which entries from different databases represent the same compound is not straightforward. Integration can be based, for example, on automatically generated cross-reference links between entries. Another approach is to use the manually curated links stored directly in databases. This study employs well-established InChI identifiers to measure the consistency and completeness of the manually curated links by comparing them with the automatically generated ones.

Results: We used two different tools to generate InChI identifiers and observed some ambiguities in their outputs. In part, these ambiguities were caused by indistinctness in interpretation of the structural data used. InChI identifiers were used successfully to find duplicate entries in databases. We found that the InChI inconsistencies in the manually curated links are very high (28.85% in the worst case). Even using a weaker definition of consistency, the measured values were very high in general. The completeness of the manually curated links was also very poor (only 93.8% in the best case) compared with that of the automatically generated links.

Conclusions: We observed several problems with the InChI tools and the files used as their inputs. There are large gaps in the consistency and completeness of manually curated links if they are measured using InChI identifiers. However, inconsistency can be caused both by errors in manually curated links and the inherent limitations of the InChI method.



IUPAC Standards Online is a database built from IUPAC’s (The International Union of Pure and Applied Chemistry) standards and recommendations, which are extracted from the journal Pure and Applied Chemistry (PAC).

The International Union of Pure and Applied Chemistry (IUPAC) is the organization responsible for setting the standards in chemistry that are internationally binding for scientists in industry and academia, patent lawyers, toxicologists, environmental scientists, legislation, etc. “Standards” are definitions of terms, standard values, procedures, rules for naming compounds and materials, names and properties of elements in the periodic table, and many more.

The database will be the only product that provides for the quick and easy search and retrieval of IUPAC’s standards and recommendations which until now have remained unsorted within the huge Pure and Applied Chemistry archive.

Covered topics:

Analytical Chemistry
Chemical Safety
Data Management
Environmental Chemistry
Inorganic Chemistry
Medicinal Chemistry
Nomenclature and Terminology
Nuclear Chemistry
Organic Chemistry
Physical Chemistry
Theoretical & Computational Chemistry

Current Status and Future Development in Relation to IUPAC Activities


The IUPAC International Chemical Identifier (InChI) is a non-proprietary, machine-readable chemical structure representation format enabling electronic searching, and interlinking and combining, of chemical information from different sources. It was developed from 2001 onwards at the U.S. National Institute of Standards and Technology under the auspices of IUPAC’s Chemical Identifier project. Since 2009, the InChI Trust, a consortium of (mostly) publishers and software developers, has taken over responsibility for funding and oversight of InChI maintenance and development. Funding and responsibility for scientific aspects of InChI development remain with the IUPAC Division VIII (Chemical Nomenclature and Structure Representation) and InChI Subcommittee.

Additive InChI-based optimal descriptors: QSPR modeling of fullerene C60 solubility in organic solvents


Optimal descriptors calculated with International Chemical Identifier (InChI) have been used to construct one-variable model of the solubility of fullerene C60 in organic solvents . Attempts to calculate the model for three splits into training and test sets gave stable results.

IUPAC InChI (Video)

This presentation is a part of Google Tech Talks which was added to the GoogleTalksArchive on August 22, 2006. The original presentation date took place on November 2, 2006.

ABSTRACT (Imported From YouTube Source)

The central token of information in Chemistry is a chemical substance, an entity that can often be represented as a well-defined chemical structure. With InChI we have a means of representing this entity as a unique string of characters, which is otherwise represented by various of 2-D and 3-D chemical drawings, ‘connection tables’ and synonyms. InChI therefore represents a discrete physical entity, to which is associated as array of chemical properties and data. NIST has long been involved in disseminating chemical reference data associated with such discrete substances. A InChI is therefore the key index to this data. Many other types of data and information are also naturally tied to it, including biological information, commercial availability, toxicity, drug effectiveness and so forth. Because of the diversity of properties and interactions of a chemical substance, effective location of chemical information generally requires further qualifiers, which may be represented coarsely as a key word, but more precisely using a controlled vocabulary. There are no simple separations between information sought by difference disciplines and for different objectives. However, reference data may be organized according the disciplines most directly involved in making the measurements: -isolated substance – mass, infrared, NMR, spectra; physical properties -substance in the context of others – solubility, affinity, .. -properties of a mixture containing the substance The desired data can be a number, vector or image, usually associated with dimensions and links to source information. In some cases, this information is typically converted to a curve or diagram for use by an expert and may be further processed by specialized software. In other cases, a single numerical values is the target. Also, some complexities of structure that must be dealt with in practical search is represented in InChI, but must be decoded for use in searching.