Liu, Shengchao and Zhu, Yutao and Lu, Jiarui and Xu, Zhao and Nie, Weili and Gitter, Anthony and Xiao, Chaowei and Tang, Jian and Guo, Hongyu and Anandkumar, Anima (2023) A Text-guided Protein Design Framework. . (Unpublished) https://resolver.caltech.edu/CaltechAUTHORS:20230316-153746362
![]() |
PDF
- Submitted Version
See Usage Policy. 3MB |
Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20230316-153746362
Abstract
Current AI-assisted protein design mainly utilizes protein sequential and structural information. Meanwhile, there exists tremendous knowledge curated by humans in the text format describing proteins' high-level properties. Yet, whether the incorporation of such text data can help protein design tasks has not been explored. To bridge this gap, we propose ProteinDT, a multi-modal framework that leverages textual descriptions for protein design. ProteinDT consists of three subsequent steps: ProteinCLAP that aligns the representation of two modalities, a facilitator that generates the protein representation from the text modality, and a decoder that generates the protein sequences from the representation. To train ProteinDT, we construct a large dataset, SwissProtCLAP, with 441K text and protein pairs. We empirically verify the effectiveness of ProteinDT from three aspects: (1) consistently superior performance on four out of six protein property prediction benchmarks; (2) over 90% accuracy for text-guided protein generation; and (3) promising results for zero-shot text-guided protein editing.
Item Type: | Report or Paper (Discussion Paper) | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Related URLs: |
| ||||||||||||||||||||||||
ORCID: |
| ||||||||||||||||||||||||
Additional Information: | This project was partly done during Shengchao Liu’s internship at Nvidia, and was supported in part by the Natural Sciences and Engineering Research Council (NSERC) Discovery Grant, the Canada CIFAR AI Chair Program, collaboration grants between Microsoft Research and Mila, Samsung Electronics Co., Ltd., Amazon Faculty Research Award, Tencent AI Lab Rhino-Bird Gift Fund, two NRC Collaborative R&D Projects (AI4D-CORE-06, AI4D-CORE-08), IVADO Fundamental Research Project grant PRF-2019-3583139727, and NSF award CHE 2226451. | ||||||||||||||||||||||||
Funders: |
| ||||||||||||||||||||||||
Record Number: | CaltechAUTHORS:20230316-153746362 | ||||||||||||||||||||||||
Persistent URL: | https://resolver.caltech.edu/CaltechAUTHORS:20230316-153746362 | ||||||||||||||||||||||||
Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | ||||||||||||||||||||||||
ID Code: | 120087 | ||||||||||||||||||||||||
Collection: | CaltechAUTHORS | ||||||||||||||||||||||||
Deposited By: | George Porter | ||||||||||||||||||||||||
Deposited On: | 16 Mar 2023 19:30 | ||||||||||||||||||||||||
Last Modified: | 16 Mar 2023 19:30 |
Repository Staff Only: item control page