Miscellaneous
PubChem chemical structure standardization
Abstract Background: PubChem is a chemical information repository, consisting of three primary databases: Substance, Compound, and BioAssay. When individual data contributors submit chemical substance descriptions to substance, the unique chemical structures are extracted and stored into Compound through an automated process called structure standardization. The present study describes the PubChem standardization approaches and analyzes them […]
Chemical Entity Semantic Specification: Knowledge representation for efficient semantic cheminformatics and facile data integration
Abstract Background: Over the past several centuries, chemistry has permeated virtually every facet of human lifestyle, enriching fields as diverse as medicine, agriculture, manufacturing, warfare, and electronics, among numerous others. Unfortunately, application-specific, incompatible chemical information formats and representation strategies have emerged as a result of such diverse adoption of chemistry. Although a number of efforts […]
Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on
Abstract Background: The Blue Obelisk movement was established in 2005 as a response to the lack of Open Data, Open Standards and Open Source (ODOSOS) in chemistry. It aims to make it easier to carry out chemistry research by promoting interoperability between chemistry software, encouraging cooperation between Open Source developers, and developing community resources and […]
International chemical identifier for reactions (RInChI)
Abstract The Reaction InChI (RInChI) extends the idea of the InChI, which provides a unique descriptor of molecular structures, towards reactions. Prototype versions of the RInChI have been available since 2011. The frst ofcial release (RInChIV1.00), funded by the InChI Trust, is now available for download (https://www.inchi-trust.org/wp/downloads/). This release defnes the format and generates hashed […]
Comparative evaluation of open source software for mapping between metabolite identifiers in metabolic network reconstructions: application to Recon 2
Abstract Background: An important step in the reconstruction of a metabolic network is annotation of metabolites. Metabolites are generally annotated with various database or structure based identifiers. Metabolite annotations in metabolic reconstructions may be incorrect or incomplete and thus need to be updated prior to their use. Genome-scale metabolic reconstructions generally include hundreds of metabolites. […]
Towards a Universal SMILES representation – A standard method to generate canonical SMILES based on the InChI
Abstract Background: There are two line notations of chemical structures that have established themselves in the field: the SMILES string and the InChI string. The InChI aims to provide a unique, or canonical, identifier for chemical structures, while SMILES strings are widely used for storage and interchange of chemical structures, but no standard exists to […]
Enhancement of the chemical semantic web through the use of InChI identifiers
Abstract Molecules, as defined by connectivity specified via the International Chemical Identifier (InChI), are precisely indexed by major web search engines so that Internet tools can be transparently used for unique structure searches.
Detection of IUPAC and IUPAC-like chemical names
Abstract Motivation: Chemical compounds like small signal molecules or other biological active chemical substances are an important entity class in life science publications and patents. Several representations and nomenclatures for chemicals like SMILES, InChI, IUPAC or trivial names exist. Only SMILES and InChI names allow a direct structure search, but in biomedical texts trivial names […]
yaInChI: Modified InChI string scheme for line notation of chemical structures
Abstract A modified InChI (International Chemical Identifier) string scheme, yaInChI (yet another InChI), is suggested as a method for including the structural information of a given molecule, making it straightforward and more easily readable. The yaInChI theme is applicable for checking the structural identity with higher sensitivity and generating three-dimensional (3-D) structures from the one-dimensional […]
InChI in the wild: an assessment of InChIKey searching in Google
Abstract While chemical databases can be queried using the InChI string and InChIKey (IK) the latter was designed for open-web searching. It is becoming increasingly effective for this since more sources enhance crawling of their websites by the Googlebot and consequent IK indexing. Searchers who use Google as an adjunct to database access may be […]
UniChem: extension of InChI-based compound mapping to salt, connectivity and stereochemistry layers
Abstract UniChem is a low-maintenance, fast and freely available compound identifier mapping service, recently made available on the Internet. Until now, the criterion of molecular equivalence within UniChem has been on the basis of complete identity between Standard InChIs. However, a limitation of this approach is that stereoisomers, isotopes and salts of otherwise identical molecules […]
QSAR-modeling of toxicity of organometallic compounds by means of the balance of correlations for InChI-based optimal descriptors
Toropov, A. A., Toropova, A. P., & Benfenati, E. (2010). QSAR-modeling of toxicity of organometallic compounds by means of the balance of correlations for InChI-based optimal descriptors. Molecular diversity, 14(1), 183-192. This paper present a use of InChI-based molecular descriptors to predict toxicity. Its abstract follows. “Quantitative structure–activity relationships (QSAR) for toxicity toward rats (pLD50) have been […]
Failures of fractional crystallization: ordered co‐crystals of isomers and near isomers
A list of 270 structures of ordered co‐crystals of isomers, near isomers and molecules that are almost the same has been compiled. Searches for structures containing isomers could be automated by the use of IUPAC International Chemical Identifier (InChI™) strings but searches for co‐crystals of very similar molecules were more labor intensive. Compounds in which […]
Matlab InChIKey Scripts
This is a collection of Matlab scripts for working with InChIKeys: IKextract, IKfreqFH, IKstring, and IKmusic IKextract, InChIKey Extract, can extract InChIKeys from chemical Structure data files (SDFs). This script was successfully used to extract over 90 million InChIKeys (unique chemical identifiers) from over 5000 PubChem SD files. Users can also extract other data from SDFs […]
Many InChIs and quite some feat
Comprehensive 2015 article published in Springer’s Journal of Computer-Aided Molecular Design. Here is the abstract, The IUPAC International Chemical Identifier (InChI) is a non-proprietary, international standard to represent chemical structures. It was conceived 15 years ago, and has been is use for 10 years. The InChI Trust is developing and improving on the current standard, […]