CaltechAUTHORS
  A Caltech Library Service

evSeq: Cost-Effective Amplicon Sequencing of Every Variant in a Protein Library

Wittmann, Bruce J. and Johnston, Kadina E. and Almhjell, Patrick J. and Arnold, Frances H. (2022) evSeq: Cost-Effective Amplicon Sequencing of Every Variant in a Protein Library. ACS Synthetic Biology, 11 (3). pp. 1313-1324. ISSN 2161-5063. doi:10.1021/acssynbio.1c00592. https://resolver.caltech.edu/CaltechAUTHORS:20211122-174115386

[img] PDF - Submitted Version
Creative Commons Attribution Non-commercial No Derivatives.

1MB
[img] PDF (Cost comparison between sequencing methods and additional information on TrpB and RmaNOD sequencing libraries, additional details on the design of evSeq primers and barcodes, detailed protocols for the implementation of evSeq, and DNA sequences used...) - Supplemental Material
See Usage Policy.

748kB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20211122-174115386

Abstract

Widespread availability of protein sequence-fitness data would revolutionize both our biochemical understanding of proteins and our ability to engineer them. Unfortunately, even though thousands of protein variants are generated and evaluated for fitness during a typical protein engineering campaign, most are never sequenced, leaving a wealth of potential sequence-fitness information untapped. Primarily, this is because sequencing is unnecessary for many protein engineering strategies; the added cost and effort of sequencing are thus unjustified. It also results from the fact that, even though many lower-cost sequencing strategies have been developed, they often require at least some access to and experience with sequencing or computational resources, both of which can be barriers to access. Here, we present every variant sequencing (evSeq), a method and collection of tools/standardized components for sequencing a variable region within every variant gene produced during a protein engineering campaign at a cost of cents per variant. evSeq was designed to democratize low-cost sequencing for protein engineers and, indeed, anyone interested in engineering biological systems. Execution of its wet-lab component is simple, requires no sequencing experience to perform, relies only on resources and services typically available to biology labs, and slots neatly into existing protein engineering workflows. Analysis of evSeq data is likewise made simple by its accompanying software (found at github.com/fhalab/evSeq, documentation at fhalab.github.io/evSeq), which can be run on a personal laptop and was designed to be accessible to users with no computational experience. Low-cost and easy-to-use, evSeq makes the collection of extensive protein variant sequence-fitness data practical.


Item Type:Article
Related URLs:
URLURL TypeDescription
https://doi.org/10.1021/acssynbio.1c00592DOIArticle
https://doi.org/10.1101/2021.11.18.469179DOIDiscussion Paper
https://github.com/fhalab/evSeq/Related ItemevSeq software
https://fhalab.github.io/evSeqRelated ItemevSeq documentation
https://doi.org/10.22002/D1.2140DOIAll raw and processed data
ORCID:
AuthorORCID
Wittmann, Bruce J.0000-0001-8144-9157
Johnston, Kadina E.0000-0002-2214-3534
Almhjell, Patrick J.0000-0003-0977-841X
Arnold, Frances H.0000-0002-4027-364X
Additional Information:© 2022 American Chemical Society. Received: November 24, 2021; Published: February 17, 2022. The authors thank Shan Li, Adrienne Rollie, and Eric Brustad at Illumina, Inc: Shan Li and Adrienne Rollie for helping us troubleshoot the evSeq method and Eric Brustad for critical reading of the manuscript. The authors also thank fellow Arnold laboratory members Nathaniel Goldberg and Nicholas Porter for implementing evSeq (which pointed us to necessary improvements), Anders Knight for suggesting and prototyping evSeq software features, Ella Watkins-Dulaney for assistance in building the TrpB libraries, and Sabine Brinkmann-Chen for critical reading of the manuscript. This work was supported by an Amgen Chem-Bio-Engineering Award (CBEA). This work was supported by the NSF Division of Chemical, Bioengineering, Environmental, and Transport Systems (CBET 1937902). This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, under award number DE-SC0022218. This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. Author Contributions. Author contributions are provided using the CRediT taxonomy: B.J.W.: conceptualization, methodology, software, validation, investigation, writing─original draft, writing─review and editing, and funding acquisition. K.E.J.: methodology, data collection, software, investigation, writing─original draft, writing─review and editing, visualization, and funding acquisition. P.J.A.: methodology, data-collection, software, validation, investigation, writing─original draft, writing─review and editing, and visualization. F.H.A.: resources, writing─original draft, writing─review and editing, and funding acquisition. Data Availability: All raw and processed data generated by this study can be found at CaltechData (DOI: 10.22002/D1.2140). The software version used to analyze all data in this study is tagged as v1.0.0 on the associated GitHub repository. The authors declare no competing financial interest.
Funders:
Funding AgencyGrant Number
AmgenUNSPECIFIED
NSFCBET-1937902
Department of Energy (DOE)DE-SC0022218
Subject Keywords:directed evolution; protein engineering; machine learning; next-generation sequencing
Issue or Number:3
DOI:10.1021/acssynbio.1c00592
Record Number:CaltechAUTHORS:20211122-174115386
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20211122-174115386
Official Citation:evSeq: Cost-Effective Amplicon Sequencing of Every Variant in a Protein Library. Bruce J. Wittmann, Kadina E. Johnston, Patrick J. Almhjell, and Frances H. Arnold. ACS Synthetic Biology 2022 11 (3), 1313-1324; DOI: 10.1021/acssynbio.1c00592
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:111972
Collection:CaltechAUTHORS
Deposited By: George Porter
Deposited On:22 Nov 2021 18:40
Last Modified:06 Apr 2022 18:32

Repository Staff Only: item control page