TY - JOUR
T1 - The message passing neural networks for chemical property prediction on SMILES
AU - Jo, Jeonghee
AU - Kwak, Bumju
AU - Choi, Hyun Soo
AU - Yoon, Sungroh
N1 - Publisher Copyright:
© 2020 The Authors
PY - 2020/7/1
Y1 - 2020/7/1
N2 - Drug metabolism is determined by the biochemical and physiological properties of the drug molecule. To improve the performance of a drug property prediction model, it is important to extract complex molecular dynamics from limited data. Recent machine learning or deep learning based models have employed the atom- and bond-type information, as well as the structural information to predict drug properties. However, many of these methods can be used only for the graph representations. Message passing neural networks (MPNNs) (Gilmer et al., 2017) is a framework used to learn both local and global features from irregularly formed data, and is invariant to permutations. This network performs an iterative message passing (MP) operation on each object and its neighbors, and obtain the final output from all messages regardless of their order. In this study, we applied the MP-based attention network (Nikolentzos et al., 2019) originally developed for text learning to perform chemical classification tasks. Before training, we tokenized the characters, and obtained embeddings of each molecular sequence. We conducted various experiments to maximize the predictivity of the model. We trained and evaluated our model using various chemical classification benchmark tasks. Our results are comparable to previous state-of-the-art and baseline models or outperform. To the best of our knowledge, this is the first attempt to learn chemical strings using an MP-based algorithm. We will extend our work to more complex tasks such as regression or generation tasks in the future.
AB - Drug metabolism is determined by the biochemical and physiological properties of the drug molecule. To improve the performance of a drug property prediction model, it is important to extract complex molecular dynamics from limited data. Recent machine learning or deep learning based models have employed the atom- and bond-type information, as well as the structural information to predict drug properties. However, many of these methods can be used only for the graph representations. Message passing neural networks (MPNNs) (Gilmer et al., 2017) is a framework used to learn both local and global features from irregularly formed data, and is invariant to permutations. This network performs an iterative message passing (MP) operation on each object and its neighbors, and obtain the final output from all messages regardless of their order. In this study, we applied the MP-based attention network (Nikolentzos et al., 2019) originally developed for text learning to perform chemical classification tasks. Before training, we tokenized the characters, and obtained embeddings of each molecular sequence. We conducted various experiments to maximize the predictivity of the model. We trained and evaluated our model using various chemical classification benchmark tasks. Our results are comparable to previous state-of-the-art and baseline models or outperform. To the best of our knowledge, this is the first attempt to learn chemical strings using an MP-based algorithm. We will extend our work to more complex tasks such as regression or generation tasks in the future.
KW - Deep learning
KW - Drug classification
KW - Message passing
KW - Word embedding
UR - http://www.scopus.com/inward/record.url?scp=85085170304&partnerID=8YFLogxK
U2 - 10.1016/j.ymeth.2020.05.009
DO - 10.1016/j.ymeth.2020.05.009
M3 - Article
C2 - 32445695
AN - SCOPUS:85085170304
SN - 1046-2023
VL - 179
SP - 65
EP - 72
JO - Methods
JF - Methods
ER -