On InChI and Evaluating the Quality of Cross-reference Links

Abstract

Background: There are many databases of small molecules focused on different aspects of research and its applications. Some tasks may require integration of information from various databases. However, determining which entries from different databases represent the same compound is not straightforward. Integration can be based, for example, on automatically generated cross-reference links between entries. Another approach is to use the manually curated links stored directly in databases. This study employs well-established InChI identifiers to measure the consistency and completeness of the manually curated links by comparing them with the automatically generated ones.

Results: We used two different tools to generate InChI identifiers and observed some ambiguities in their outputs. In part, these ambiguities were caused by indistinctness in interpretation of the structural data used. InChI identifiers were used successfully to find duplicate entries in databases. We found that the InChI inconsistencies in the manually curated links are very high (28.85% in the worst case). Even using a weaker definition of consistency, the measured values were very high in general. The completeness of the manually curated links was also very poor (only 93.8% in the best case) compared with that of the automatically generated links.

Conclusions: We observed several problems with the InChI tools and the files used as their inputs. There are large gaps in the consistency and completeness of manually curated links if they are measured using InChI identifiers. However, inconsistency can be caused both by errors in manually curated links and the inherent limitations of the InChI method.

Threads

Information
Content Type	OER
Author(s)	Jakub Galgonek, Jiří Vondrášek
DOI	https://doi.org/10.1186/1758-2946-6-15
Content Link	https://jcheminf.biomedcentral.com/track/pdf/10.1186/1758-2946-6-15
License	Open Access
Content Status	publish
Date Published	April 17, 2014
Content Tags	Cheminformatics, Classroom Material, Content type, Publication

Threads

InChI Tags: Cheminformatics, Classroom Material, Content type and Publication

Cheminformatics

5 November 2019

Cheminformatics

6 November 2019

On InChI and Evaluating the Quality of Cross-reference Links

Abstract

InChI, the IUPAC International Chemical Identifier

UniChem: extension of InChI-based compound mapping to salt, connectivity and stereochemistry layers