Reconstruction of lossless molecular representations

Umit V. Ucak, Islambek Ashyrmamatov, and Juyong Lee
ChemRxiv. Cambridge: Cambridge Open Engage; 2022
Development Published: (Mar/2022)
DOI: https://chemrxiv.org/engage/chemrxiv/article-details/62273eb250b6211bf1ed8132
Abstract:

SMILES is the most dominant molecular representation used in AI-based chemical applications, but also responsible for certain issues associated with its internal structure. Here, we exploit the idea that structural fingerprints may be used as efficient alternatives to unique molecular representations. For this purpose, we assessed the conversion efficiency of fingerprints back to the molecules. We successfully reconstructed molecules with the NMT approach, achieving a high level of accuracy. Our approach therefore brings structural fingerprints into play as strong representational tools in chemical NLP applications by restoring the connectivity information that is lost during the fingerprint transformation. This comprehensive study addresses the major limitation of structural fingerprints which precludes their implementations in NLP models. Our findings should enhance the efficiency of the models in generative and translational fields.

Leave a comment