A Caltech Library Service

evSeq: Cost-Effective Amplicon Sequencing of Every Variant in a Protein Library

Wittmann, Bruce J. and Johnston, Kadina E. and Almhjell, Patrick J. and Arnold, Frances H. (2022) evSeq: Cost-Effective Amplicon Sequencing of Every Variant in a Protein Library. ACS Synthetic Biology, 11 (3). pp. 1313-1324. ISSN 2161-5063. doi:10.1021/acssynbio.1c00592.

[img] PDF - Submitted Version
Creative Commons Attribution Non-commercial No Derivatives.

[img] PDF (Cost comparison between sequencing methods and additional information on TrpB and RmaNOD sequencing libraries, additional details on the design of evSeq primers and barcodes, detailed protocols for the implementation of evSeq, and DNA sequences used...) - Supplemental Material
See Usage Policy.


Use this Persistent URL to link to this item:


Widespread availability of protein sequence-fitness data would revolutionize both our biochemical understanding of proteins and our ability to engineer them. Unfortunately, even though thousands of protein variants are generated and evaluated for fitness during a typical protein engineering campaign, most are never sequenced, leaving a wealth of potential sequence-fitness information untapped. Primarily, this is because sequencing is unnecessary for many protein engineering strategies; the added cost and effort of sequencing are thus unjustified. It also results from the fact that, even though many lower-cost sequencing strategies have been developed, they often require at least some access to and experience with sequencing or computational resources, both of which can be barriers to access. Here, we present every variant sequencing (evSeq), a method and collection of tools/standardized components for sequencing a variable region within every variant gene produced during a protein engineering campaign at a cost of cents per variant. evSeq was designed to democratize low-cost sequencing for protein engineers and, indeed, anyone interested in engineering biological systems. Execution of its wet-lab component is simple, requires no sequencing experience to perform, relies only on resources and services typically available to biology labs, and slots neatly into existing protein engineering workflows. Analysis of evSeq data is likewise made simple by its accompanying software (found at, documentation at, which can be run on a personal laptop and was designed to be accessible to users with no computational experience. Low-cost and easy-to-use, evSeq makes the collection of extensive protein variant sequence-fitness data practical.

Item Type:Article
Related URLs:
URLURL TypeDescription Paper ItemevSeq software ItemevSeq documentation raw and processed data
Wittmann, Bruce J.0000-0001-8144-9157
Johnston, Kadina E.0000-0002-2214-3534
Almhjell, Patrick J.0000-0003-0977-841X
Arnold, Frances H.0000-0002-4027-364X
Additional Information:© 2022 American Chemical Society. Received: November 24, 2021; Published: February 17, 2022. The authors thank Shan Li, Adrienne Rollie, and Eric Brustad at Illumina, Inc: Shan Li and Adrienne Rollie for helping us troubleshoot the evSeq method and Eric Brustad for critical reading of the manuscript. The authors also thank fellow Arnold laboratory members Nathaniel Goldberg and Nicholas Porter for implementing evSeq (which pointed us to necessary improvements), Anders Knight for suggesting and prototyping evSeq software features, Ella Watkins-Dulaney for assistance in building the TrpB libraries, and Sabine Brinkmann-Chen for critical reading of the manuscript. This work was supported by an Amgen Chem-Bio-Engineering Award (CBEA). This work was supported by the NSF Division of Chemical, Bioengineering, Environmental, and Transport Systems (CBET 1937902). This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, under award number DE-SC0022218. This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. Author Contributions. Author contributions are provided using the CRediT taxonomy: B.J.W.: conceptualization, methodology, software, validation, investigation, writing─original draft, writing─review and editing, and funding acquisition. K.E.J.: methodology, data collection, software, investigation, writing─original draft, writing─review and editing, visualization, and funding acquisition. P.J.A.: methodology, data-collection, software, validation, investigation, writing─original draft, writing─review and editing, and visualization. F.H.A.: resources, writing─original draft, writing─review and editing, and funding acquisition. Data Availability: All raw and processed data generated by this study can be found at CaltechData (DOI: 10.22002/D1.2140). The software version used to analyze all data in this study is tagged as v1.0.0 on the associated GitHub repository. The authors declare no competing financial interest.
Funding AgencyGrant Number
Department of Energy (DOE)DE-SC0022218
Subject Keywords:directed evolution; protein engineering; machine learning; next-generation sequencing
Issue or Number:3
Record Number:CaltechAUTHORS:20211122-174115386
Persistent URL:
Official Citation:evSeq: Cost-Effective Amplicon Sequencing of Every Variant in a Protein Library. Bruce J. Wittmann, Kadina E. Johnston, Patrick J. Almhjell, and Frances H. Arnold. ACS Synthetic Biology 2022 11 (3), 1313-1324; DOI: 10.1021/acssynbio.1c00592
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:111972
Deposited By: George Porter
Deposited On:22 Nov 2021 18:40
Last Modified:06 Apr 2022 18:32

Repository Staff Only: item control page