RECORDED CDD WEBINAR: CAPTURING MIXTURES — BRINGING INFORMATICS TO THE WORLD OF PRACTICAL CHEMISTRY
Hosted and presented by the Collaborative Drug Discovery (CDD) Vault Watch our webinar featuring Dr. Chris Jakober (Johns Hopkins), Leah McEwen (Cornell), and Dr. Alex Clark (CDD) to hear about our work toward new data structures for capturing chemical mixtures in a machine-readable format, as well as the potential impact this will have on all […]
Chemistry Programming with Python – Web Scraping Wikipedia For Chemical Identifiers (Tutorial)
Andrew P. Cornell, Robert E. Belford Chemistry Department, University of Arkansas at Little Rock, Little Rock, Arkansas 72204 Abstract Many individual chemicals have a specific page on Wikipedia that will give information about the use, manufacture and properties of that chemical. The properties that are displayed off to the side include the relevant chemical […]
Chemistry Programming with Python – Retrieving InChI From PubChem (Tutorial)
Andrew P. Cornell, Robert E. Belford Chemistry Department, University of Arkansas at Little Rock, Little Rock, Arkansas 72204 Abstract In this tutorial, a program written in Python will take a user specified chemical name and retrieve the associated chemical identifier or basic property using an online chemical database. This program can be used as […]
Chemistry Programming with Python – Convert a SMILE String to InChI Using ChemSpider (Tutorial)
Andrew P. Cornell, Robert E. Belford Chemistry Department, University of Arkansas at Little Rock, Little Rock, Arkansas 72204 Abstract ChemSpider offers many methods in which to access online data through web API (Application Programming Interface) interactions.1 This tutorial will explain how to write a few simple lines of code in Python that will allow […]
PubChem chemical structure standardization
Abstract Background: PubChem is a chemical information repository, consisting of three primary databases: Substance, Compound, and BioAssay. When individual data contributors submit chemical substance descriptions to substance, the unique chemical structures are extracted and stored into Compound through an automated process called structure standardization. The present study describes the PubChem standardization approaches and analyzes them […]
Chemical Entity Semantic Specification: Knowledge representation for efficient semantic cheminformatics and facile data integration
Abstract Background: Over the past several centuries, chemistry has permeated virtually every facet of human lifestyle, enriching fields as diverse as medicine, agriculture, manufacturing, warfare, and electronics, among numerous others. Unfortunately, application-specific, incompatible chemical information formats and representation strategies have emerged as a result of such diverse adoption of chemistry. Although a number of efforts […]
Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on
Abstract Background: The Blue Obelisk movement was established in 2005 as a response to the lack of Open Data, Open Standards and Open Source (ODOSOS) in chemistry. It aims to make it easier to carry out chemistry research by promoting interoperability between chemistry software, encouraging cooperation between Open Source developers, and developing community resources and […]
UniChem: a unified chemical structure cross-referencing and identifier tracking system
Abstract UniChem is a freely available compound identifier mapping service on the internet, designed to optimize the efficiency with which structure-based hyperlinks may be built and maintained between chemistry-based resources. In the past, the creation and maintenance of such links at EMBL-EBI, where several chemistry-based resources exist, has required independent efforts by each of the […]
International chemical identifier for reactions (RInChI)
Abstract The Reaction InChI (RInChI) extends the idea of the InChI, which provides a unique descriptor of molecular structures, towards reactions. Prototype versions of the RInChI have been available since 2011. The frst ofcial release (RInChIV1.00), funded by the InChI Trust, is now available for download (https://www.inchi-trust.org/wp/downloads/). This release defnes the format and generates hashed […]
Comparative evaluation of open source software for mapping between metabolite identifiers in metabolic network reconstructions: application to Recon 2
Abstract Background: An important step in the reconstruction of a metabolic network is annotation of metabolites. Metabolites are generally annotated with various database or structure based identifiers. Metabolite annotations in metabolic reconstructions may be incorrect or incomplete and thus need to be updated prior to their use. Genome-scale metabolic reconstructions generally include hundreds of metabolites. […]
Consistency of systematic chemical identifiers within and between small-molecule databases
Abstract Background: Correctness of structures and associated metadata within public and commercial chemical databases greatly impacts drug discovery research activities such as quantitative structure–property relationships modelling and compound novelty checking. MOL files, SMILES notations, IUPAC names, and InChI strings are ubiquitous file formats and systematic identifiers for chemical structures. While interchangeable for many cheminformatics purposes […]
Towards a Universal SMILES representation – A standard method to generate canonical SMILES based on the InChI
Abstract Background: There are two line notations of chemical structures that have established themselves in the field: the SMILES string and the InChI string. The InChI aims to provide a unique, or canonical, identifier for chemical structures, while SMILES strings are widely used for storage and interchange of chemical structures, but no standard exists to […]
Enhancement of the chemical semantic web through the use of InChI identifiers
Abstract Molecules, as defined by connectivity specified via the International Chemical Identifier (InChI), are precisely indexed by major web search engines so that Internet tools can be transparently used for unique structure searches.
Detection of IUPAC and IUPAC-like chemical names
Abstract Motivation: Chemical compounds like small signal molecules or other biological active chemical substances are an important entity class in life science publications and patents. Several representations and nomenclatures for chemicals like SMILES, InChI, IUPAC or trivial names exist. Only SMILES and InChI names allow a direct structure search, but in biomedical texts trivial names […]
QSPR modeling of octanol water partition coefficient of platinum complexes by InChI-based optimal descriptors
Abstract Comparison of the quantitative structure—property relationships (QSPR) based on optimal descriptors calculated with the International Chemical Identifier (InChI) and QSPR based on optimal descriptors calculated with simplified molecular input line entry system has shown that the InChI-based optimal descriptors give more accurate prediction for the logarithm of octanol/water partition coefficient of platinum complexes.
Tautomer Identification and Tautomer Structure Generation Based on the InChI Code
Abstract An algorithm is introduced that enables a fast generation of all possible prototropic tautomers resulting from the mobile H atoms and associated heteroatoms as defined in the InChI code. The InChI-derived set of possible tautomers comprises (1,3)-shifts for open-chain molecules and (1,n)-shifts (with n being an odd number >3) for ring systems. In addition, our algorithm […]
yaInChI: Modified InChI string scheme for line notation of chemical structures
Abstract A modified InChI (International Chemical Identifier) string scheme, yaInChI (yet another InChI), is suggested as a method for including the structural information of a given molecule, making it straightforward and more easily readable. The yaInChI theme is applicable for checking the structural identity with higher sensitivity and generating three-dimensional (3-D) structures from the one-dimensional […]
InChIKey collision resistance: an experimental testing
Abstract InChIKey is a 27-character compacted (hashed) version of InChI which is intended for Internet and database searching/indexing and is based on an SHA-256 hash of the InChI character string. The first block of InChIKey encodes molecular skeleton while the second block represents various kinds of isomerism (stereo, tautomeric, etc.). InChIKey is designed to be […]
InChI: connecting and navigating chemistry
Abstract The International Chemical Identifier (InChI) has had a dramatic impact on providing a means by which to deduplicate, validate and link together chemical compounds and related information across databases. Its influence has been especially valuable as the internet has exploded in terms of the amount of chemistry related information available online. This thematic issue […]