Polymer-Unit Graph: Advancing Interpretability in Graph Neural Network Machine Learning for Organic Polymer Semiconductor Materials
Abstract
The graph representation of complex materials plays a crucial role in the field of inorganic and organic materials investigations for developing data-centric materials science, such as those using graph neural networks (GNNs). However, the currently prevalent GNN models are primarily employed for investigating periodic crystals and organic small molecule data, yet they still encounter challenges in terms of interpretability and computational efficiency when applied to polymer monomers and organic macromolecules data. There is still a lack of graph representation of organic polymers and macromolecules specifically tailored for GNN models to explore the structural characteristics. The Polymer-unit Graph, a novel coarse-grained graph representation method introduced in study, is dedicated to expressing and analyzing polymers and macromolecules. By incorporating the Polymer-unit Graph into the GNN models and analyzing the organic semiconductor (OSC) materials database, it becomes possible to uncover intricate structure–property relationships involving branched-chain engineering, fluoridation substitution, and donor–acceptor combination effects on the elementary structure of OSC polymers. Furthermore, the Polymer-unit Graph enables visualizing the relationship between target properties and polymer units while reducing training time by an impressive 98% and minimizing molecular graph representation models. In conclusion, the Polymer-unit Graph successfully integrates the concept of Polymer-unit into the field of GNNs, enabling more accurate analysis and understanding of organic polymers and macromolecules.
Copyright and License
© 2024 American Chemical Society.
Acknowledgement
Financial support was provided by the National Key R&D Program of China (2022YFA1203400), Natural Science Foundation of China (92163212), Guangdong Basic and Applied Basic Research Foundation (2022A1515110628), and Guangdong Provincial Key Laboratory of Computational Science and Material Design (2019B030301001). Zhang acknowledges support from the Guangdong Innovation Research Team Project (2017ZT07C062), and Goddard was supported by the US NSF (CBET 2311117). Computing resources were supported by the Center for Computational Science and Engineering at Southern University of Science and Technology.
Contributions
C.Y., W.Z., and W.A.G. formulated this project. X.Z. performed data collection, program coding, and ML analysis. Y.S., X.L., and J.Y. provided helpful discussion for program coding. X.Z. and C.Y. cowrote the manuscript. W.Z. and W.A.G. revised the manuscript. W.Z., W.A.G., and C.Y. secured the funding.
Data Availability
-
Comparison of accuracy between gn-exp and PU-gn-exp; transform gn-exp into a classification model; hyperparameter adjustment of gn-exp and PU-gn-exp; Polymer-unit Graph enhancing the operational efficiency of the MPNN; details for PU-MPNN and mol_MPNN; and MSE of mol-MPNN and PU-MPNN on OSC data sets (PDF)
-
Polymer_OSC_data sets (PDF)
-
Structure of polymer-units (PDF)
-
Baseline model and PU-gn-exp visualization results for all OSC data (PDF)
-
Polymer_unit_SMILES and graph embeddings feature (XLSX)
Conflict of Interest
The authors declare no competing financial interest.
Files
Name | Size | Download all |
---|---|---|
md5:35e9992a358005abf9c0609fd111ab33
|
27.2 MB | Preview Download |
md5:21aba5a456342a78ae53ab6445fe41f4
|
282.1 kB | Preview Download |
md5:49722da719fc831ccb97c6d85681f1d9
|
1.2 MB | Preview Download |
md5:5db8b71da2e4dee8a0ca1e8aa988c44f
|
627.7 kB | Download |
md5:acf4fe5f0d69e14a838211ac14a39fd1
|
4.9 MB | Preview Download |
Additional details
- ISSN
- 1549-9626
- Ministry of Science and Technology of the People's Republic of China
- 2022YFA1203400
- National Natural Science Foundation of China
- 92163212
- Guangdong Science and Technology Department
- 2022A1515110628
- Guangdong Provincial Key Laboratory Of Computational Science And Material Design
- 2019B030301001
- Guangdong Innovation Research Team Project
- 2017ZT07C062
- National Science Foundation
- CBET-2311117