InChI Tag: Curricular Material

43 posts

Top level tag

Matlab InChIKey Scripts

This is a collection of Matlab scripts for working with InChIKeys: IKextract, IKfreqFH, IKstring, and IKmusic

IKextract, InChIKey Extract, can extract InChIKeys from chemical Structure data files (SDFs). This script was successfully used to extract over 90 million InChIKeys (unique chemical identifiers) from over 5000 PubChem SD files. Users can also extract other data from SDFs by specifying the desired SD tag.

IKfreqFH, InChIKey frequency of first hash block, extracts the first hash block of InChIKeys and sorts them by frequency. Such a method is useful for analyzing the variety of chemical connectivity in large datasets.

IKstring, InChIKey String, allows for searching for strings within InChIKeys. I use it to search the > 90 million InChIKeys in PubChem.

IKmusic, InChIKey music, creates music from InChIKeys. A unique song is created for each InChIKey (i.e. every unique chemical substance has a different song!)

IUPAC Name2PubChem

This submission shows you how to create a smart spreadsheet with Google Sheets that links an IUPAC name to a chemical’s PubChem landing page. You may click here to get a copy of this sheet.  This particular sheet uses the Centre for Molecular Informatics OPSIN (Open Parser for Systematic IUPAC nomenclature) web service to convert the name to an InChI key, which is then appended to a hyperlink to PubChem.   You will note that some of the names do not work and this is because those names in the sample sheet are incorrect names.  If you paste those names directly into the OPSIN web service, it will tell you were an error in parsing the name occurred.

The following video shows you how to create this  Google Sheet and below it is the instructions and code needed. This application takes advantage of the canonical nature of the InChI and its key, and the fact that the key allows you to communicate over the web.

 

Step 1: Paste your IUPAC names into a column of your spread sheet

Step 2: Convert IUPAC name to Standard InChI key
type the following script into the top cell of the column you want to place your keys into, and hit enter”

=IMPORTDATA(“http://opsin.ch.cam.ac.uk/opsin/”&[SPREADSHEET CELL WITH IUPAC NAME]&“.stdinchikey”)

  • the ampersand(&)concatenates the cell content to the URL
  • the ampersand must be surrounded by quotation marks
  • the URL must be in quotation marks

Click on the black box in the bottom right corner of cell and drag down, converting the entire column of names to keys.

Step 3: Hyperlink the key to PubChem
Type the following script into the top cell of the column you want to place your links into, and hit enter”n

=HYPERLINK(“https://pubchem.ncbi.nlm.nih.gov/compound/”&[SPREADSHEET CELL WITH INCHIKEY]&“”)

  • the ampersand (&) concatenates the cell content to the URL
  • the ampersand must be surrounded by quotation marks
  • the URL must be in quotation marks

NOTE, these are dynamic cells – And will be recalculated everytime you open the page, or change the chemical name.  If you want them to be static, you can copy the block of cells, and paste to another location as text.

You can also download the sheet as an Excel Spreadsheet, but the downloaded sheet will not be dynamic.  It will be linked, but will not change if you change the IUPAC name.