MolMiner: You only look once for chemical structure recognition

Y Xu, J Xiao, CH Chou, J Zhang, J Zhu, Q Hu, H Li, Ningsheng Han, Bingyu Liu, Shuaipeng Zhang, Jinyu Han, Zhen Zhang, Shuhao Zhang, Weilin Zhang, Luhua Lai, Jianfeng Pei
arXiv:2205.11016v1
Published: (May/2022)
DOI: https://doi.org/10.48550/arXiv.2205.11016
Abstract:

Molecular structures are always depicted as 2D printed form in scientific docu- ments like journal papers and patents. However, these 2D depictions are not machine- readable. Due to a backlog of decades and an increasing amount of these printed literature, there is a high demand for the translation of printed depictions into machine- readable formats, which is known as Optical Chemical Structure Recognition (OCSR). Most OCSR systems developed over the last three decades follow a rule-based approach where the key step of vectorization of the depiction is based on the interpretation of vec- tors and nodes as bonds and atoms. Here, we present a practical software MolMiner.