Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published February 12, 2024 | v2
Journal Article Open

State-specific protein–ligand complex structure prediction with a multiscale deep generative model

  • 1. ROR icon California Institute of Technology

Abstract

The binding complexes formed by proteins and small molecule ligands are ubiquitous and critical to life. Despite recent advancements in protein structure prediction, existing algorithms are so far unable to systematically predict the binding ligand structures along with their regulatory effects on protein folding. To address this discrepancy, we present NeuralPLexer, a computational approach that can directly predict protein–ligand complex structures solely using protein sequence and ligand molecular graph inputs. NeuralPLexer adopts a deep generative model to sample the three-dimensional structures of the binding complex and their conformational changes at an atomistic resolution. The model is based on a diffusion process that incorporates essential biophysical constraints and a multiscale geometric deep learning system to iteratively sample residue-level contact maps and all heavy-atom coordinates in a hierarchical manner. NeuralPLexer achieves state-of-the-art performance compared with all existing methods on benchmarks for both protein–ligand blind docking and flexible binding-site structure recovery. Moreover, owing to its specificity in sampling both ligand-free-state and ligand-bound-state ensembles, NeuralPLexer consistently outperforms AlphaFold2 in terms of global protein structure accuracy on both representative structure pairs with large conformational changes and recently determined ligand-binding proteins. NeuralPLexer predictions align with structure determination experiments for important targets in enzyme engineering and drug discovery, suggesting its potential for accelerating the design of functional proteins and small molecules at the proteome scale.

Copyright and License

© The Author(s), under exclusive licence to Springer Nature Limited 2024.

Acknowledgement

Z.Q. acknowledges graduate research funding from Caltech and partial support from the Amazon-Caltech AI4Science fellowship. T.F.M. acknowledges partial support from the Caltech DeLogi fund, and A.A. acknowledges support from a Caltech Bren professorship. We thank M. Welborn, F. R. Manby, C. Zhang and V. Bhethanabotla for discussions on the work and for comments on the manuscript. We thank A. Meller and J. Borowsky for sharing the PocketMiner dataset.

Contributions

Z.Q., W.N., A.V., T.F.M. and A.A. conceived and designed the experiments. Z.Q. performed the experiments. Z.Q., W.N., A.V., T.F.M. and A.A. analysed the data. Z.Q. contributed analysis tools. Z.Q. and A.A. wrote the paper.

Data Availability

All datasets and predictions used to generate the reported results are available on Code Ocean86 and also on Zenodo at https://doi.org/10.5281/zenodo.10373581.

Code Availability

The code, scripts and interactive data analysis notebooks are available on Code Ocean86 and also on GitHub at https://github.com/zrqiao/NeuralPLexer.

Conflict of Interest

Z.Q. and T.F.M. are currently employees of Iambic Therapeutics or its affiliates. A provisional patent application related to this work has been filed (US Patent App. provisional 63/496,899). The remaining authors declare no competing interests.

Files

42256_2024_792_MOESM1_ESM.pdf
Files (1.7 MB)
Name Size Download all
md5:f9e0f3203ad592f517c101a912c194c0
261.2 kB Preview Download
md5:54091b9cbbd6704a070150e807454853
1.5 MB Preview Download

Additional details

Created:
February 14, 2024
Modified:
February 14, 2024