InChI Tag: Resolvers

6 posts

Comparative evaluation of open source software for mapping between metabolite identifiers in metabolic network reconstructions: application to Recon 2

Abstract

Background:

An important step in the reconstruction of a metabolic network is annotation of metabolites. Metabolites are generally annotated with various database or structure based identifiers. Metabolite annotations in metabolic reconstructions may be incorrect or incomplete and thus need to be updated prior to their use.
Genome-scale metabolic reconstructions generally include hundreds of metabolites. Manually updating annotations is therefore highly laborious. This prompted us to look for open-source software applications that could facilitate automatic updating of annotations by mapping between available metabolite identifiers. We identified three applications developed for the metabolomics and chemical informatics communities as potential solutions. The applications were MetMask, the Chemical Translation System, and UniChem. The first implements a “metabolite masking” strategy for mapping between identifiers whereas the latter two implement different versions of an InChI based strategy. Here we evaluated the suitability of these applications for the task of mapping between metabolite identifiers in genome-scale metabolic reconstructions. We applied the best suited application to updating identifiers in Recon 2, the latest reconstruction of human metabolism.

Results:

All three applications enabled partially automatic updating of metabolite identifiers, but significant manual effort was still required to fully update identifiers. We were able to reduce this manual effort by searching for new identifiers using multiple types of information about metabolites. When multiple types of information were combined, the Chemical Translation System enabled us to update over 3,500 metabolite identifiers in Recon 2. All but approximately 200 identifiers were updated automatically.

Conclusions:

We found that an InChI based application such as the Chemical Translation System was better suited to the task of mapping between metabolite identifiers in genome-scale metabolic reconstructions. We identified several features, however, that could be added to such an application in order to tailor it to this task.

InChI in the wild: an assessment of InChIKey searching in Google

Abstract

While chemical databases can be queried using the InChI string and InChIKey (IK) the latter was designed for open-web searching. It is becoming increasingly effective for this since more sources enhance crawling of their websites by the Googlebot and consequent IK indexing. Searchers who use Google as an adjunct to database access may be less familiar with the advantages of using the IK as explored in this review. As an example, the IK for atorvastatin retrieves ~200 low-redundancy links from a Google search in 0.3 of a second. These include most major databases and a very low false-positive rate. Results encompass less familiar but potentially useful sources and can be extended to isomer capture by using just the skeleton layer of the IK. Google Advanced Search can be used to filter large result sets. Image searching with the IK is also effective and complementary to open-web queries. Results can be particularly useful for less-common structures as exemplified by a major metabolite of atorvastatin giving only three hits. Testing also demonstrated document-to-document and document-to-database joins via structure matching. The necessary generation of an IK from chemical names can be accomplished using open tools and resources for patents, papers, abstracts or other text sources. Active global sharing of local IK-linked information can be accomplished via surfacing in open laboratory notebooks, blogs, Twitter, figshare and other routes. While information-rich chemistry (e.g. approved drugs) can exhibit swamping and redundancy effects, the much smaller IK result sets for link-poor structures become a transformative first-pass option. The IK indexing has therefore turned Google into a de-facto open global chemical information hub by merging links to most significant sources, including over 50 million PubChem and ChemSpider records. The simplicity, specificity and speed of matching make it a useful option for biologists or others less familiar with chemical searching. However, compared to rigorously maintained major databases, users need to be circumspect about the consistency of Google results and provenance of retrieved links. In addition, community engagement may be necessary to ameliorate possible future degradation of utility.

Batch Chemical IDs Conversion in Spreadsheets

Common tools for conversions, including some spreadsheet-based options included in this site, are hard to use for hundred or thousands of compounds we may want to use in cheminformatics projects. This resource includes a diferent approach to the conversion. By using the PubChem Power User Gateway it allows converting hundreds of chemical identifiers on a single call the a webservice.

Two files are included in this OER: an Excel file, that includes two UDF functions for doing the conversions, documentation and examples; and a VBA module that can be imported to any Excel file to include this functions to any existing spreadsheet.

 

IUPAC Name2PubChem

This submission shows you how to create a smart spreadsheet with Google Sheets that links an IUPAC name to a chemical’s PubChem landing page. You may click here to get a copy of this sheet.  This particular sheet uses the Centre for Molecular Informatics OPSIN (Open Parser for Systematic IUPAC nomenclature) web service to convert the name to an InChI key, which is then appended to a hyperlink to PubChem.   You will note that some of the names do not work and this is because those names in the sample sheet are incorrect names.  If you paste those names directly into the OPSIN web service, it will tell you were an error in parsing the name occurred.

The following video shows you how to create this  Google Sheet and below it is the instructions and code needed. This application takes advantage of the canonical nature of the InChI and its key, and the fact that the key allows you to communicate over the web.

 

Step 1: Paste your IUPAC names into a column of your spread sheet

Step 2: Convert IUPAC name to Standard InChI key
type the following script into the top cell of the column you want to place your keys into, and hit enter”

=IMPORTDATA(“http://opsin.ch.cam.ac.uk/opsin/”&[SPREADSHEET CELL WITH IUPAC NAME]&“.stdinchikey”)

  • the ampersand(&)concatenates the cell content to the URL
  • the ampersand must be surrounded by quotation marks
  • the URL must be in quotation marks

Click on the black box in the bottom right corner of cell and drag down, converting the entire column of names to keys.

Step 3: Hyperlink the key to PubChem
Type the following script into the top cell of the column you want to place your links into, and hit enter”n

=HYPERLINK(“https://pubchem.ncbi.nlm.nih.gov/compound/”&[SPREADSHEET CELL WITH INCHIKEY]&“”)

  • the ampersand (&) concatenates the cell content to the URL
  • the ampersand must be surrounded by quotation marks
  • the URL must be in quotation marks

NOTE, these are dynamic cells – And will be recalculated everytime you open the page, or change the chemical name.  If you want them to be static, you can copy the block of cells, and paste to another location as text.

You can also download the sheet as an Excel Spreadsheet, but the downloaded sheet will not be dynamic.  It will be linked, but will not change if you change the IUPAC name.