Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published August 18, 2023 | v1
Journal Article Open

DeCOIL: Optimization of Degenerate Codon Libraries for Machine Learning-Assisted Protein Engineering

Abstract

With advances in machine learning (ML)-assisted protein engineering, models based on data, biophysics, and natural evolution are being used to propose informed libraries of protein variants to explore. Synthesizing these libraries for experimental screens is a major bottleneck, as the cost of obtaining large numbers of exact gene sequences is often prohibitive. Degenerate codon (DC) libraries are a cost-effective alternative for generating combinatorial mutagenesis libraries where mutations are targeted to a handful of amino acid sites. However, existing computational methods to optimize DC libraries to include desired protein variants are not well suited to design libraries for ML-assisted protein engineering. To address these drawbacks, we present DEgenerate Codon Optimization for Informed Libraries (DeCOIL), a generalized method that directly optimizes DC libraries to be useful for protein engineering: to sample protein variants that are likely to have both high fitness and high diversity in the sequence search space. Using computational simulations and wet-lab experiments, we demonstrate that DeCOIL is effective across two specific case studies, with the potential to be applied to many other use cases. DeCOIL offers several advantages over existing methods, as it is direct, easy to use, generalizable, and scalable. With accompanying software (https://github.com/jsunn-y/DeCOIL), DeCOIL can be readily implemented to generate desired informed libraries.

Copyright and License

© 2023 American Chemical Society.

Acknowledgement

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, under Award Number DE-SC0022218. This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. J.Y. and F.Z.L are supported by the National Science Foundation Graduate Research Fellowship. The authors thank Bruce Wittmann and Jennifer Sun for helpful discussions and Sabine Brinkmann-Chen for critical reading of the manuscript.

Contributions

J.Y.: conceptualization, methodology, software, validation, investigation, writing─original draft, writing─review and editing, visualization, and funding acquisition. J.D.: data collection, investigation, writing─original draft, writing─review and editing, and funding acquisition. K.E.J.: methodology, data collection, software, investigation, writing─original draft, writing─review and editing, and visualization. F-Z.L.: data collection, software, writing─review and editing, and funding acquisition. Y.Y.: methodology, resources, writing─original draft, writing─review and editing, and funding acquisition. F.H.A.: resources, writing─original draft, writing─review and editing, and funding acquisition.

Data Availability

Additional analyses of optimized DC libraries, visual explanations for mathematical concepts, and details on the primers and DNA sequences used in this work (PDF)

 

 

Conflict of Interest

The authors declare no competing financial interest.

Files

sb3c00301_si_001.pdf
Files (1.2 MB)
Name Size Download all
md5:68e89600b40c7e51968c56a3817c18b7
1.2 MB Preview Download

Additional details

Created:
April 8, 2024
Modified:
April 8, 2024