Machine learning-based discovery of molecular descriptors that control polymer gas permeation
Abstract
While machine learning has found increasing use in predicting the properties of polymeric materials with only a knowledge of chain architecture, determining the molecular factors underpinning properties (“interpretable AI”) has remained less well explored. We show that encoding chain chemistry in commonly employed formats, e.g., binary-valued fingerprints, leads to uniqueness issues during the hashing process to save storage space. This is because the hashing algorithm can map several chemical moieties into the same bit. These issues carry over into the ML algorithms, especially for “inverse” design and interpretable AI, and cannot be avoided by changing the length of the fingerprint. Using MACCS key featurizations of monomer repeats resolves some of these issues, and we show that a few substructures consistently appear in top features for maximizing permeability across several gases and ML models. These are carbon–carbon double bonds (as in polyacetylenes) especially when they are associated with methyl groups (found in branching architectures). These results, derived from the limited data set of ∼500 polymers with experimental gas permeation data, are in agreement with physical insight and thus provide a robust foundation which could further enable study of these material classes through detailed experiments and simulations.
Copyright and License (English)
© 2024 Elsevier.
Acknowledgement (English)
Tejus Shastry acknowledges support from the Department of Energy, USA through grant DE-SC-0008772. Sanat Kumar was funded by a grant from the King Abdullah University of Science and Technology, Saudi Arabia. This material is also based upon work supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, under Award Number DE-SC-0012704, and by Brookhaven National Laboratory , Laboratory Directed Research and Development grant no. 24-004.
Contributions (English)
Tejus Shastry: Data curation, Investigation, Methodology, Software, Visualization, Writing – original draft. Yasemin Basdogan: Data curation, Writing – review & editing. Zhen-Gang Wang: Writing – review & editing. Sanat K. Kumar: Conceptualization, Funding acquisition, Methodology, Project administration, Supervision, Writing – original draft. Matthew R. Carbone: Conceptualization, Investigation, Methodology, Software, Supervision, Validation, Visualization, Writing – original draft.
Data Availability (English)
Data available at https://doi.org/10.5061/dryad.5x69p8dbm or available from authors upon request.
Conflict of Interest (English)
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional details
- United States Department of Energy
- DE-SC-0008772
- King Abdullah University of Science and Technology
- United States Department of Energy
- DE-SC-0012704
- Brookhaven National Laboratory
- 24-004