Creating custom synthetic genomes in
Escherichia coli
with REXER and GENESIS
Wesley E. Robertson
1,3
, Louise F. H. Funke
1,3
, Daniel de la Torre
1,3
, Julius Fredens
1,3
,
Kaihang Wang
2
✉
and Jason W. Chin
1
✉
We previously developed REXER (Replicon EXcision Enhanced Recombination); this method enables the replacement of
>100 kb of the
Escherichia coli
genome with synthetic DNA in a single step and allows the rapid identi
fi
cation of non-viable
or otherwise problematic sequences with nucleotide resolution. Iterative repetition of REXER (GENESIS, GENomE
Stepwise Interchange Synthesis) enables stepwise replacement of longer contiguous sections of genomic DNA with
synthetic DNA, and even the replacement of the entire
E. coli
genome with synthetic DNA. Here we detail protocols for
REXER and GENESIS. A standard REXER protocol typically takes 7
–
10 days to complete. Our description encompasses (i)
synthetic DNA design, (ii) assembly of synthetic DNA constructs, (iii) utilization of CRISPR
–
Cas9 coupled to lambda-red
recombination and positive/negative selection to enable the high-
fi
delity replacement of genomic DNA with synthetic
DNA (or insertion of synthetic DNA), (iv) evaluation of the success of the integration and replacement and (v)
identi
fi
cation of non-tolerated synthetic DNA sequences with nucleotide resolution. This protocol provides a set of precise
genome engineering methods to create custom synthetic
E. coli
genomes.
Introduction
Genome synthesis is a powerful technology for addressing fundamental biological questions and enables
the creation of organisms with useful properties. Viable genomes have been synthesized for two
organisms:
Mycoplasma
, in which a 1 Mb genome was generated and genome minimization investi-
gated
1
,
2
,and
Escherichia coli
, for which a 4 Mb recoded genome was synthesized. The latter study
removed over 18,000 synonymous codons, creating a cell that uses a compressed genetic code with just
61 codons to enable sense codon reassignment for non-canonical amino acid (ncAA) incorporation
3
.
Central to the creation of this synthetic
E. coli
genome was the development of methods for replacing
large (Mb) sections of the genome with synthetic DNA, through the seamless iteration of methods that
enable more than 100 kb to be replaced in each step. Crucially
—
since the viability of synthetic DNA
sequences is not commonly known in advance
—
these methods also provide feedback on precisely where
in its sequence a synthetic DNA design fails. Ongoing efforts aim to synthesize other genomes
4
,
5
.
Sequence-speci
fi
c recombinases can be introduced into
E. coli
and used to direct recombination at
their cognate target sequences, but these target sequences must
fi
rst be introduced into the genome.
Replacements of up to 72 kb and insertions of approximately 50 kb have been achieved with this
approach
6
–
9
. These approaches have not been used to replace genomic DNA on the Mb scale, and do
not provide feedback on precisely where a synthetic DNA design fails.
Lambda-red recombination
10
and related bacteriophage recombination systems
11
,
12
are widely
used to replace genomic DNA with linear double-stranded DNA (dsDNA) that is electroporated into
E. coli
. This approach does not require prior modi
fi
cation of the host genome, but relies on placing
~50 bp homologies with the genome at the ends of the introduced DNA. While this approach is
useful for manipulating single genes, the ef
fi
ciency of the process falls off rapidly with the length of
DNA introduced; lambda-red recombination is prohibitively inef
fi
cient for introducing more than a
few kb of DNA into
E. coli
.In
Salmonella typhimurium
, rolling circle ampli
fi
cation and cleavage
methods have been used to generate large quantities of linear double-stranded DNA for lambda-red
recombination, and iteration of this approach
—
using appropriate selection markers
—
has enabled
200 kb of the
S. typhimurium
genome to be replaced with synthetic DNA
13
.
1
Medical Research Council Laboratory of Molecular Biology, Cambridge, England, UK.
2
Division of Biology and Biological Engineering, California
Institute of Technology, Pasadena, CA, USA.
3
These authors contributed equally: Wesley E. Robertson, Louise F. H. Funke, Daniel de la Torre, Julius
Fredens.
✉
e-mail:
kaihangwang@caltech.edu
;
chin@mrc-lmb.cam.ac.uk
NATURE PROTOCOLS
| VOL 16 | MAY 2021 | 2345
–
2380 |
www.nature.com/nprot
2345
PROTOCOL
https://doi.org/10.1038/s41596-020-00464-3
1234567890():,;
1234567890():,;
Maint. of
Loss of
+2
–1
REXER 2
REXER 4
Locus
0
Recombination
Locus
0
Recombination
Locus
0
with –2 +2 at
locus
0
Genome
wt
+ ~100-kb insertion
Locus
0
Excision of dsDNA by
CRISPR–Cas9
from BAC and genome
Locus
0
Excision of dsDNA by
CRISPR–Cas9
from BAC only
maint. of
loss of
REXER
+2
–1
maint. of
loss of
REXER
+1
–2
De novo designed and
synthesized genome
~100-kb syn. DNA fragment 1
with –2 +2
Wt genome
With
–1 +1
E. coli
Locus
0
Locus
1
Fragment 2
with -1 +1
Partially synthesized
genome with –2 +2
Locus
1
Locus
2
Fragment 3
with –2 +2
Partially synthesized
genome with
–1 +1
Locus
2
Locus
3
Step 1
Step 2
Iterations of REXER
(GENESIS)
c
a
Excision of dsDNA by
CRISPR–Cas9
from BAC only
Locus
0
Locus
1
Locus
0
Excision of dsDNA by
CRISPR–Cas9
from BAC and genome
Locus
1
Locus
0
Locus
1
Recombination
Locus
0
Locus
1
Recombination
Maint. of
Loss of
+2
–1
REXER 2
REXER 4
Locus
1
Locus
0
~100-kb replacement
between
locus
0
and
locus
1
with –2 +2 at
locus
1
b
PROTOCOL
NATURE PROTOCOLS
2346
NATURE PROTOCOLS
| VOL 16 | MAY 2021 | 2345
–
2380 |
www.nature.com/nprot
To address the challenges of
E. coli
genome synthesis we developed Replicon EXcision Enhanced
Recombination (REXER), which enables the highly ef
fi
cient and accurate integration of large syn-
thetic DNA fragments at a programmed genomic locus in
E. coli
14
. In REXER, synthetic DNA that
has been assembled in an episome (bacterial arti
fi
cial chromosome, BAC) is transformed into
E. coli
.
In REXER2, the synthetic DNA is speci
fi
cally excised from the episome using two CRISPR cuts. The
resulting in vivo
–
generated synthetic linear double-stranded DNA is then integrated into the genome
by lambda-red recombination, using ~50 bp regions of homology between the ends of the synthetic
DNA and genomic DNA. REXER4 introduces two additional CRISPR cuts in the genome that
fl
ank
the targeted region for replacement, resulting in higher recombination ef
fi
ciencies
14
. Genetic markers
are then used to select cells in which synthetic DNA has replaced genomic DNA.
Importantly, REXER and GENESIS result in high-
fi
delity replacement of genomic DNA with synthetic
DNA; we observed only eight non-programmed mutations when replacing the 4 Mb genome of
E. coli
with recoded synthetic DNA using REXER/GENESIS, a frequency of 2×10
-6
errors per bp
3
.
REXER can be used to insert synthetic DNA (Fig.
1a
) or to replace a speci
fi
c genomic sequence
with a corresponding synthetic sequence (Fig.
1b
); single-step insertions of up to 90 kb and single step
replacements of up to 120 kb have been demonstrated
3
,
14
. Notably, REXER can be rapidly iterated in
a process termed GENESIS (GENomE Stepwise Interchange Synthesis) to introduce longer con-
tiguous synthetic sections into the genome (Fig.
1c
); this approach has been used to replace up to
1 Mb of the genome with synthetic DNA in a single strain, and genome sections from distinct strains
have been combined in a single strain to enable the synthesis of a functional fully synthetic genome
with a compressed genetic code
3
.
As we develop our nascent ability to design synthetic genomes that will support life, this feedback
cycle for evaluating DNA insertions and replacements will be critical, both for creating synthetic
genomes and for developing an understanding of genome construction rules. It may also be possible
to extend the REXER and GENESIS approach to create synthetic genomes in additional species.
REXER work
fl
ow
Four key components are necessary to implement REXER: i) a BAC that harbors the synthetic DNA
of interest prior to genomic integration (Fig.
2
); ii) CRISPR
–
Cas9 components (Fig.
3a
) that are
programmed to speci
fi
cally excise the synthetic DNA from the BAC (REXER 2 and REXER 4) and cut
the genome at the integration sites (REXER 4) (Fig.
3b
); iii) the lambda-red machinery that directs the
replacement of genomic DNA with synthetic DNA via ~50 bp regions of homology (Fig.
3a,c
); and iv)
positive and negative selection markers that are used to select clones that have undergone the desired
exchange of genomic DNA for synthetic DNA (Figs.
1
and
4
).
The work
fl
ow starts with assembly of a BAC containing the designed synthetic DNA of interest
(approximately 100 kb) in yeast
3
,
15
. Multiple 10 kb stretches of sequence-veri
fi
ed synthetic DNA, the
BAC backbone, yeast arti
fi
cial chromosome (YAC) components, and corresponding selection cas-
settes are transformed into yeast spheroplasts. The homologous recombination machinery of the host
cell is used to assemble the desired BAC; this process uses ~80 bp overlaps between the DNA
stretches to direct the correct assembly (Fig.
2a
). Yeast clones with correctly assembled BACs are
identi
fi
ed by colony PCRs
fl
anking all junctions. The BAC is then puri
fi
ed from yeast and electro-
porated into
E. coli
cells that have been prepared for REXER (as described below) (Fig.
2b
), or a
standard cloning strain. The BAC is then ampli
fi
ed in
E. coli
and its sequence veri
fi
ed by next-
generation sequencing (NGS). If the BAC was initially transformed into a standard cloning strain it is
Fig. 1 |
Schematic of REXER and GENESIS.
a
, Insertion of synthetic DNA by REXER. The
E. coli
genome (light grey) containing an
rpsL
-
kan
R
double
selection cassette (-1/
+
1, yellow and blue, respectively) at locus
0
; this cassette is introduced by lambda-red mediated recombination. A BAC
containing >100 kb of synthetic DNA (pink), coupled to a
sacB
-
cat
double selection cassette (-2/
+
2, magenta and green respectively) is shown. In
REXER2 (top), Cas9 creates two double-stranded cuts in the BAC,
fl
anking the synthetic DNA and double selection marker (blue and orange triangles
and bars). Lambda-red recombination between regions of homology (dashed lines) mediates the integration of the synthetic insert in the genome
(presumably by a strand invasion process, see Fig.
3c
). Selection for the gain of
cat
(
+
2) and the loss of
rpsL
(-1) identi
fi
es successful replacements.
REXER4 (bottom) proceeds analogously to REXER2, but Cas9 generates two additional cuts in the genome (purple and red triangles and bars),
excising the
rpsL-kanR
marker and exposing linear homologous ends in the genome. Lambda-red-mediated recombination proceeds (presumably by
strand annealing, see Fig.
3c
).
b
, Replacement of genomic DNA with synthetic DNA by REXER. The color scheme and markers are as in (a). Rather than
being at the target locus for insertion, here the
rpsL-kanR
marker is located at the beginning of the region to be replaced. As in (a), Cas9 generates two
(REXER2, top) cuts on the BAC and two on the genome (REXER4, bottom), exposing linear homologous ends at the beginning and end of the region to
be replaced. Genetic selection as in (a) identi
fi
es successful replacements.
c
, GENESIS is the iteration of REXER. The result of a REXER step is a
partially replaced genome containing a new selection cassette at the end of the replaced region. This genome is a direct template for the next REXER
step. Alternate use of the two selection cassettes -1/
+
1 and -2/
+
2 allows for iteration of REXER in adjacent sections of the genome.
NATURE PROTOCOLS
PROTOCOL
NATURE PROTOCOLS
| VOL 16 | MAY 2021 | 2345
–
2380 |
www.nature.com/nprot
2347
subsequently transferred by either electroporation (Fig.
2b
) or conjugation (Fig.
2c
) to a strain that
has been prepared for REXER.
The
E. coli
cells that will receive the BAC are prepared for REXER by introducing
—
via electro-
poration
—
the helper plasmid
14
, which contains lambda-red components (Red alpha, beta, and
gamma) and the Cas9 gene and tracrRNA of the CRISPR
–
Cas9 system (Fig.
3
). The K43R mutation is
HR1
–2
+2
HR2
–1
syn. DNA ~100 kb
BAC ori
YAC ori
URA3
Yeast promoter
OriT
syn. DNA ~10 kb
HR1
BAC ori
URA3
–2
+2
HR2
–1
YAC ori
Yeast
promoter
OriT
S. cerevisiae
HR1
–2
+2
HR2
–1
syn. DNA ~100 kb
BAC ori
YAC ori
URA3
OriT
Wt genome
With
–1 +1
Wt genome
With
–1 +1
E. coli
HR1
–2
+2
HR2
–1
syn. DNA ~100 kb
BAC ori
YAC ori
URA3
OriT
Maint. of
Gain of
{
+1
+2
E. coli
Wt genome
With
–1 +1
Conjugation
plasmid
Wt genome
With –1 +1
E. coli
Maint. of
Gain of
{
+1
+2
E. coli
b
a
c
PROTOCOL
NATURE PROTOCOLS
2348
NATURE PROTOCOLS
| VOL 16 | MAY 2021 | 2345
–
2380 |
www.nature.com/nprot
also introduced into the chromosomal
rpsL
to allow the use of wild-type
rpsL
as a negative selection
marker, and a positive and negative double selection cassette is inserted at a speci
fi
c sequence in the
genome. Ideally, the sequence of the BAC should be veri
fi
ed by NGS in the strain that will be used
for REXER.
The array of spacers that are speci
fi
c for target sequences in the BAC (REXER 2 and REXER 4) and
in the genome (REXER 4) are created by in vitro assembly methods (Boxes
1
–
3
); the spacer sequences
differ in different REXER experiments. These spacers are encoded on either a circular plasmid or a
linear dsDNA array with no replication origin. The electroporation of the spacers into cells prepared
for REXER and containing the relevant BAC initiates transcription of CRISPR RNAs (crRNAs) and
initiates the REXER reaction (Fig.
4a
).
Following transformation of the spacers, the cells are recovered for several hours. This step allows
time for the synthetic DNA to be introduced into the genome and for the protein product of the
negative selection marker that was initially present in the genome to be diluted by cell division and
proteolysis. Serial dilutions of cells are then spread on LB agar plates supplemented with antibiotics
that select for the product of REXER and incubated at 37 °C overnight. Individual colonies are
resuspended in water and re-streaked onto fresh selection plates to guarantee isolation of individual
clones (Fig.
4a
).
Individual clones are screened by phenotyping and genotyping. Multiple post-REXER clones are
arrayed onto LB agar plates with different combinations of antibiotics using the pre-REXER clone
(with pre-REXER BAC and helper plasmid), from before the transformation of spacers arrays, as a
control (Fig.
4b
). PCR using primers
fl
anking both ends of the newly-introduced synthetic DNA with
the downstream double selection cassette are performed (Fig.
4c
). Close to 100% of all REXER clones
show correct phenotyping and genotyping in a typical REXER reaction with fully functional REXER
components
3
,
14
. Rarely, mutations in the promoter and/or open reading frame (ORF) of the negative
and/or positive selection markers can occur in pre-REXER cells; in these rare cases a substantial
number of false positives might be observed amongst the re-streaked post-REXER clones in phe-
notyping and/or genotyping assays. As a part of the integrated work
fl
ow, all post-REXER clones are
routinely veri
fi
ed by phenotyping and genotyping, and only clones with the correct phenotype and
genotype proceed to the NGS and sequence analysis steps (Fig.
4d
). With assembled REXER
components in hand, the standard REXER protocol typically takes 7
–
10 days to complete.
Identifying and
fi
xing synthetic sequences that are not tolerated
When inserting synthetic DNA into the genome it is often possible to conserve the functions encoded in
both the synthetic DNA and the original genome. Dominant lethal synthetic DNA is also unlikely to be
inserted as it must
fi
rst be tolerated when present on the BAC as an extra-genomic copy. However, when
attempting to replace genomic DNA with synthetic DNA it can be unclear whether the synthetic
sequence can substitute for the original sequence. It is important to be able to identify (with nucleotide
resolution) and
fi
x any regions of synthetic DNA that are not tolerated in the genome.
If there are marked homologies between the synthetic sequence and the targeted genomic
sequence, chimeras between the synthetic sequence and the original genomic sequence might be
observed in the post-REXER clones; these chimeras are presumed to result from recombinational
crossover (Fig.
5a,b
). When the entire synthetic sequence is well tolerated we commonly observe
complete replacement of the genomic sequence by the synthetic sequence with few (or no) chimeras;
in these cases, most post-REXER clones contain the entire synthetic sequence (Fig.
5a
). When a
region of the synthetic sequence is not tolerated we
fi
nd that post-REXER clones have sequences that
Fig. 2 |
BAC assembly and delivery.
a
, Assembly of a BAC containing synthetic DNA by homologous recombination in
S. cerevisiae
. Synthetic DNA
stretches of up to 10 kb (pink), each containing 80
–
200 bp of homology to their adjacent stretches, are transformed into
S. cerevisiae
spheroplasts
together with: (I). A BAC vector fragment (with 50
–
80 bp homology to the 5
′
end of the synthetic DNA insert) containing BAC replication and
segregation components, an origin of transfer (
oriT
, grey) and a
URA3
auxotrophy marker under control of a yeast promoter. (II). A selection construct
fragment (with 50
–
80 bp homology to the 3 ́end of the synthetic DNA insert) containing a
sacB-cat
double selection cassette (-2/
+
2, magenta and
green respectively), an
rpsL
negative selection marker (-1, yellow) and a
CEN6
centromere linked to an autonomously replicating region (
YAC ori
).
The resulting assembly contains HR1 and HR2 (homology regions for REXER)
fl
anking the synthetic DNA and
sacB-cat
double selection cassette.
b
, Transformation of a BAC into a target cell for REXER. Puri
fi
ed BAC DNA is electroporated into a target
E. coli
cell containing an
rpsL-kanR
double
selection cassette in its genome for REXER. Selection for chloramphenicol resistance (conferred by
+
2) yields cells that contain the BAC.
c
, Conjugation of a BAC from a shuttle
E. coli
cell into a target cell for REXER. The donor cell contains the BAC and a non-transferrable fertility plasmid
(conjugation plasmid pJF146, MK809154.1). The recipient cell has already been prepared for REXER by integration of the
rpsL-kanR
double selection
marker. Selection for the maintenance of the recipient
kanR
marker (
+
1) together with the gain of the BAC
cat
(
+
2) allows the identi
fi
cation of
successful conjugants.
NATURE PROTOCOLS
PROTOCOL
NATURE PROTOCOLS
| VOL 16 | MAY 2021 | 2345
–
2380 |
www.nature.com/nprot
2349
are chimeras between the synthetic sequence and the original genomic sequence. In the individual
post-REXER chimeras, the boundaries vary between the synthetic sequence and the original genomic
sequence; this presumably results from the different sites of recombinational crossover in different
GGA
TCC
CCTAGG
HR1 (~50 bp)
ACT
AGG
TGATCC
sacB
cat
HR2 (~50 bp)
CCT
AGG
GGATCC
......CCU
UCC......
rpsL
BAC
tracrRNA
Red
γ
Cas9
Red
α
Red
β
pBAD promoter
Endogenous
promoter
Helper plasmid
......CCU
......CCU
......CUA
......NNN
crRNAs
Cas9
tracrRNA
or
Direct repeats
Leader seq.
Spacers
Endogenous
promoter
For REXER 2
Direct repeats
Leader seq.
Spacers
Endogenous
promoter
For REXER 4
Spacer plasmid
Red
γ
Red
α
Red
β
Constant parts
Variable parts
b
a
c
Syn. DNA
HR1 (~50 bp)
CTA AGG
GATTCC
rpsL
KanR
HR2 (~50 bp)
......CUA
GGN NNN
CCNNNN
NNN......
3’
5’
HR2 (~50 bp)
syn. DNA
3’
HR2 (~50 bp)
Red
γ
recBCD
Red
α
Red
β
syn. DNA
Genome
syn. DNA
3’
HR2 (~50 bp)
3’
syn. DNA
HR2 (~50 bp)
3’
HR2 (~50 bp)
Genome
syn. DNA
HR2 (~50 bp)
Genome
For REXER 2
For REXER 4
Strand invasion
Strand annealing
HR2 (~50 bp)
Genome
PROTOCOL
NATURE PROTOCOLS
2350
NATURE PROTOCOLS
| VOL 16 | MAY 2021 | 2345
–
2380 |
www.nature.com/nprot
clones. However, in these cases all post-REXER clones will contain the wild-type sequence in place of
the deleterious section of the synthetic sequence. Typically, the sequences of 6
–
8 post-REXER clones
are analyzed. Plotting the percentage of clones in which the original sequence is converted to the
synthetic sequence at each position allows the identi
fi
cation of nucleotide positions where
the synthetic sequence is not tolerated; the original nucleotide sequence is always found at these
positions (Fig.
5b
).
After the deleterious synthetic sequence has been identi
fi
ed, the synthetic sequence can be
fi
xed in
the BAC. A re-designed synthetic dsDNA fragment can be introduced during the yeast assembly stage
or by a two-step lambda-red recombination approach with the BAC in
E. coli
. For the two-step
lambda-red recombination approach,
fi
rst a double selection cassette (e.g.,
rpsL-kanR
) is integrated in
place of the synthetic design
fl
aw in the BAC in the presence of a positive selection agent (e.g.,
kanamycin). Subsequently, the double selection cassette is replaced with a corrected synthetic
sequence provided as dsDNA in the presence of a negative selection agent (e.g., streptomycin). The
fi
xed BAC can then be re-introduced into the relevant
E. coli
clone and used for REXER (Fig.
6a
).
This approach is likely to be successful in generating clones in which the new synthetic sequence
replaces the original sequence (Fig.
4d
). The ability to quickly and reliably identify and
fi
x deleterious
synthetic sequences is crucial; it allows REXER to be used to debug and improve the original design of
the synthetic sequence, leading to the identi
fi
cation of a fully functional synthetic sequence (Fig.
6b
).
Materials
Biological materials
●
E. coli
NEB 10-beta cells, a derivative of DH10B, for conventional cloning steps and plasmid
propagation (New England BioLabs, cat. no. C3020K).
●
E. coli
cells of interest in which REXER is to be performed. Any strain of
E. coli
with a sequenced
genome can be used; the strain will vary based upon one
’
s speci
fi
c experimental goals. Using the
rpsL-kanR
double selection cassette requires a strain that contains the
rpsL
K43R mutation (e.g.,
E. coli
NEB 10-beta). The mutation can be introduced in any strain by recombineering
14
.
●
Saccharomyces cerevisiae
BY4741 (MATa his3
Δ
1 leu2
Δ
0 met15
Δ
0 ura3
Δ
0) for assembling DNA
constructs by homologous recombination, available from ATCC (no. 201388)
Reagents
Plasmids
●
Helper plasmid (contains lambda-red recombination components and Cas9 with tracrRNA, accession
number MN927219, available from the authors).
●
pMB1 spacer plasmid (contains spacer arrays and template sequence in accession number
MK809152.
1
, available from the authors).
●
Plasmid pJF146 (optional for experiments involving conjugation, accession number
MK809154.1
,
available from the authors).
●
Template for BAC vector (contains BAC replication and segregation components, an origin of transfer
oriT
and a
URA3
selectable marker, accession number
MK809150.1
, available from the authors).
●
pSC101_
YAC
-ori template plasmids bearing negative markers:
sacB
(accession number MN927220,
also contains double selection marker
rpsL-kanR
, available from the authors) and
rpsL
(accession
number MN927221, also contains double selection marker
sacB-cat
, available from the authors).
Fig. 3 |
Components for REXER.
a
, Components for REXER. The helper plasmid (tetR) contains a tracrRNA under control of the wild-type
S. pyogenes
CRISPR promoter, as well as the components for lambda-red recombination under an arabinose-inducible pBAD promoter. The spacer plasmid
(AmpR) contains two (REXER2) or four (REXER4) 30 nt spacer sequences, interspersed between CRISPR direct repeats, all under control of the
endogenous
S. pyogenes
CRISPR array promoter. Transcription and processing of these repeats yields crRNAs, which anneal to the tracrRNA for
loading into Cas9. The helper plasmid encodes the common components of different REXER experiments, whereas the contents of the spacer plasmid
are variable as the crRNAs are designed to cleave speci
fi
c, user-de
fi
ned, sequences.
b
, Schematic of Cas9 cleavage of the BAC and genome in a
REXER4 experiment. The synthetic DNA and
sacB-cat
insert in the BAC (top) is
fl
anked by CCTAGG sequences
–
this sequence is arti
fi
cially introduced
during BAC assembly, and the AGG serves as a PAM for Cas9 cleavage. In the genome (bottom), a CTAAGG sequence is arti
fi
cially introduced just
upstream of the
rpsL-kanR
double selection marker (see Steps 1
–
8). The 3
′
end of the genomic region to be replaced is de
fi
ned by a CCNNNN
sequence. The AGG and NGG serve as PAMs for Cas9 cleavage. The latter two cuts do not feature in REXER2.
c
, Hypothetical mechanisms for
homologous DNA recombination in REXER2 and REXER4. REXER2 generates a linear double-stranded end for one of the substrates for recombination
(the BAC)
–
recession of one strand of one of the linear double-stranded ends followed by strand invasion during chromosomal replication might
facilitate recombination
19
. In REXER4, linear double-stranded ends are generated for both substrates for recombination (BAC and genome). This might
allow for recession of one strand from each substrate followed by strand annealing.
NATURE PROTOCOLS
PROTOCOL
NATURE PROTOCOLS
| VOL 16 | MAY 2021 | 2345
–
2380 |
www.nature.com/nprot
2351
Genome
BAC
Helper plasmid
or
Spacer
plasmid
37 °C,
~20 h
37 °C,
~20 h
37 °C,
~1 h
37 °C,
~3 h
Elec-
troporate
Induced
E. coli
cell
Cell pellet
Serial dilution,
plate on LB agar
+ antibiotics
4 mL SOB
100 mL LB
+ antibiotics
Pick colonies,
restreak
Restreaked
colonies
Stocking
Genotyping
Phenotyping
Post-REXER clones
-
Genome:
Pre.
1
2
3
Clone:
Kan
+1
Strep
–1
Cm
+2
–2
Suc.
4
WT
Phenotyping
Genome:
Ladder (bp)
Pre.
WT
123
Clone:
4
Post-REXER clones
Genotyping
rpsL-KanR
at
locus
0
PCR flanking
locus
0
PCR flanking
locus
1
sacB-cat
at
locus
1
b
d
a
c
Locus
1
Locus
0
Cm
+2
–2
Suc.
Spacer
array
REXER
Phenotyping &
genotyping
NGS
Seq. analysis
Full replacement
identified
Deleterious seq.
identified
Discard
GENESIS
Deleterious seq.
fixed
NGS
Correct
Incorrect
500
1,000
1,500
3,000
1,000
1,500
3,000
Step 193
Step 194
Step 195
Steps 196–198
Steps 199 & 200
Steps 201–205
Fig. 4 |
REXER work
fl
ow.
a
, Experimental work
fl
ow for REXER. Linear or plasmid-based spacer arrays are electroporated into electrocompetent
E. coli
cells that contain an
rpsL-kanR
construct (-1/
+
1, yellow and blue, respectively) integrated at the beginning of the region to be replaced (locus
0
), and a
BAC containing a synthetic insert of interest and a
sacB-cat
(-2/
+
2, magenta and green, respectively) selection cassette with homology to the end of
the region to be replaced (locus
1
). Cells are recovered from electroporation in 4 mL of SOB, then transferred to 100 mL of LB with antibiotics to select
for the helper plasmid (tetracycline) and
cat
(chloramphenicol). Cells are harvested and spread on LB agar plates with selection for
cat
and against
rpsL
(streptomycin). Colonies are then streaked on selection plates, and the streaked colonies are picked for downstream analysis.
b
, Expected results for
phenotypic validation before and after REXER. Cells are picked and stamped on LB agar plates supplemented with different antibiotics, and the growth
in each condition assessed. The results shown here are spliced together from separate plates containing separate antibiotics. Pre-REXER cells
containing an episomal BAC should be resistant to kanamycin (Kan) and sensitive to streptomycin (Strep) by virtue of the
rpsL-kanR
cassette in the
genome. The BAC contains a
sacB-cat
cassette, which should confer resistance to chloramphenicol (Cm), but the cells should survive on sucrose (Suc)
alone by means of losing the BAC. Selection with both sucrose and chloramphenicol enforces the maintenance of the BAC and results in cell death.
After a successful replacement, loss of kanamycin resistance and loss of sensitivity to streptomycin are expected. The newly acquired genomic
sacB-
cat
cassette should maintain resistance for chloramphenicol, but confer sensitivity to sucrose alone.
c
, Expected results for colony PCR at locus
0
and
locus
1
before and after REXER. PCR with primers
fl
anking locus
0
should yield a larger product in pre-REXER than in post-REXER cells, owing to the loss
of the
rpsL-kanR
cassette. Conversely, PCR at locus
1
should yield a longer product after REXER due to the gain of the
sacB-cat
cassette.
d
, Work
fl
ow for
identifying successful post-REXER clones that can proceed to GENESIS. A number of post-REXER clones are assessed by their growth phenotypes in
selective medium and their genotypes at locus
0
and locus
1
. Promising clones resulting from this analysis are subjected to next-generation sequencing
(NGS) to identify clones in which the replacement is complete. Clones that do not pass each checkpoint are discarded. Systematic incomplete
replacement in all clones resulting from an experiment may indicate a problematic synthetic sequence. The problematic sequence is identi
fi
ed and
fi
xed. An altered synthetic sequence is generated and used for a new REXER experiment.
PROTOCOL
NATURE PROTOCOLS
2352
NATURE PROTOCOLS
| VOL 16 | MAY 2021 | 2345
–
2380 |
www.nature.com/nprot
1
0
Post-REXER clone
A3
20,000
0
40,000
60,000
80,000
100,000
Partial replacement
syn. seq.
syn. seq.
wt seq.
Locus
0
Locus
1
Locus
0
Locus
1
Locus
0
Locus
1
Locus
0
Locus
1
Locus
0
Locus
1
Locus
0
Locus
0
Locus
0
Locus
1
Locus
1
Locus
1
Locus
0
Locus
1
Post-REXER clone
A1
1
0
20,000
0
40,000
60,000
80,000
100,000
Full replacement
Locus
0
Locus
1
1 = syn. seq.
0 = wt seq.
Locus
0
Locus
1
Genomic positions between
locus
0
&
locus
1
Genomic positions between
locus
0
&
locus
1
Genomic positions between
locus
0
&
locus
1
syn. seq., ~120 kb
Post-REXER clone
A2
1
0
20,000
0
40,000
60,000
80,000
100,000
Full replacement
Locus
0
Locus
1
Locus
0
Locus
1
syn. seq., ~120 kb
NGS
Compiled recoding landscape
A
100%
0
20,000
0
40,000
60,000
80,000
100,000
% of population
with syn.seq.
Post-REXER
clone
B1
Post-REXER
clone
B2
Post-REXER
clone
B3
Locus
0
Locus
1
Locus
0
Locus
1
Locus
0
Locus
1
Post-REXER clone
B3
1
0
20,000
0
40,000
60,000
80,000
100,000
Wt seq.
syn. seq.
syn. seq.
Partial replacement
1
0
0
Post-REXER clone
B2
20,000
40,000
60,000
80,000
100,000
syn. seq.
syn. seq.
Wt seq.
Partial replacement
Post-REXER clone
B1
1
0
20,000
0
40,000
60,000
80,000
100,000
syn. seq.
syn. seq.
Wt seq.
Partial replacement
1 = syn. seq.
0 = wt seq.
Compiled recoding landscape
B
100%
0
20,000
0
40,000
60,000
100,000
Position of deleterious seq.
syn. seq. design
0% syn. seq.
% of population
with syn. seq.
Recoding attempt
B
Locus
0
Locus
1
Post-REXER
clone
A1
Post-REXER
clone
A2
Post-REXER
clone
A3
Locus
0
Locus
1
Recoding attempt
A
Locus
0
Locus
1
Locus
0
Locus
1
Locus
0
Locus
1
REXER
Locus
1
Locus
0
b
a
NATURE PROTOCOLS
PROTOCOL
NATURE PROTOCOLS
| VOL 16 | MAY 2021 | 2345
–
2380 |
www.nature.com/nprot
2353
Additional reagents
●
Milli-Q H
2
O
●
PrimeSTAR HS High Fidelity Polymerase (Takara, cat. no. R010A)
●
PrimeSTAR GXL Polymerase (Takara, cat. no. R050A)
●
OneTaq 2× PCR Master Mix with Loading Buffer (New England BioLabs, cat. no. M0482S)
●
AccI Restriction Enzyme (New England BioLabs, cat. no. R0161S)
●
BsaI restriction enzyme (New England BioLabs, cat. no. R0535S)
●
DpnI Restriction Enzyme (New England BioLabs, cat. no. R0176S)
●
EcoRI Restriction enzyme (New England BioLabs, cat. no. R0101S)
●
Mung Bean nuclease (New England BioLabs, cat. no. M0250S)
●
SpeI restriction enzyme (New England BioLabs, cat. no. R0133S)
●
6× Gel Loading Dye (New England BioLabs, B7024S)
●
Agarose Hi-Pure Low EEO (BioGene, cat. no. 300-200)
●
SYBR Safe DNA stain (Life Technologies, cat. no. S33102)
●
2× NEBuilder HiFi Assembly Mix (New England BioLabs, cat. no. E2621S)
●
L(
+
)-arabinose (ACROS Organics, cat. no. 15289285)
●
Ampicillin sodium salt (Sigma-Aldrich, cat. no. A9518-5G)
●
Kanamycin sulfate (Sigma-Aldrich, cat. no. K1377)
●
Streptomycin sulfate salt (Sigma-Aldrich, cat. no. S9137-25G)
●
Sucrose (Fisher Scienti
fi
c, cat. no. S/8560/63)
●
Chloramphenicol (MP Biomedicals, cat. no. 0219032190)
●
Hygromycin B (50 mg/mL) (Thermo Fisher Scienti
fi
c, cat. no. 10687010)
●
p
-Cl-Phe (4-chloro-DL-phenylalanine) (Sigma-Aldrich, cat. no. C6506-25G)
●
TE Buffer 10× (G-Biosciences, cat. no. 786-033)
●
Glycerol (Sigma-Aldrich, cat. no. G9012-1L)
●
Glucose (Gibco, cat. no. 15023021)
●
NaOH 4 mol/L (4 N) in aqueous solution (VWR Chemicals, cat. no. 191373MP)
●
β
-mercaptoethanol (Sigma-Aldrich, cat. no. M6250)
●
D-Sorbitol (Sigma-Aldrich, cat. no. 85529)
●
Yeast extract (Thermo Fisher Scienti
fi
c, cat. no. 211929)
●
Zymolyase, 20T from
Arthrobacter luteus
, 1 g (MP Biomedicals, cat. no. 320921)
●
Bacto Tryptone (Thermo Fisher Scienti
fi
c, cat. no. 211705)
●
Bacto Agar Solidifying Agent (BD Biosciences, cat. no. 214010)
●
Sodium dodecyl sulfate (SDS), 20% (American Bioanalytical, cat. no. AB01922)
●
PEG 8000 (Sigma-Aldrich, cat. no. 89510)
●
Adenine (Sigma-Aldrich, cat. no. A8626)
●
Bacto peptone (BD Biosciences, cat. no. 211677)
●
EDTA (Ethylenediaminetetraacetic acid) (Sigma-Aldrich cat. no. EDS)
●
Na
3
PO
4
(Sigma-Aldrich, cat. no. 342483)
●
NaCl (Sigma-Aldrich, cat. no. S9888)
●
CaCl
2
(Sigma-Aldrich, cat. no. C1016)
●
(NH
4
)
2
SO
4
(Sigma-Aldrich, cat. no A4915)
●
Tris-HCl (Trizma hydrochloride) (Sigma-Aldrich, cat. no. T5941)
●
Difco casamino acids, vitamin assay (Thermo Fisher, cat. no. 228820)
Fig. 5 |
Identifying deleterious sequences in synthetic DNA.
a
, REXER with a tolerated synthetic sequence. REXER
leads to replacement of the original genomic DNA with synthetic DNA between genomic locus
0
and locus
1
(colors as
in Fig.
1
). Three representative post-REXER clones (A1
–
3) that phenotype and genotype correctly are shown. NGS
(gray box) reveals the extent to which individual post-REXER clones are chimeras between the original (wt) genomic
sequence and the synthetic (syn.) sequence. Clones A1 and A2 exhibit complete replacement, but clone A3 is a
chimera that contains a short region of the original genomic sequence. The compiled recoding landscape, which plots
the percentage of clones that are recoded at each potentially variable nucleotide across the locus, shows that the
synthetic sequence is tolerated. If dips in the compiled recoding landscape persist upon compiling data from many
clones, it can indicate that corresponding region of synthetic sequence is tolerated, but suboptimal.
b
, REXER with
synthetic DNA harboring a deleterious sequence (colors as in Fig.
1
). Three representative post-REXER clones (B1
–
3)
that phenotype and genotype correctly are shown. NGS (gray box) reveals that all post-REXER clones are chimeras
between the original genomic sequence and the synthetic sequence. The compiled recoding landscape narrows
down the sequence that is derived from the original genome in all chimeras, and thereby identi
fi
es the position of the
deleterious sequence.
PROTOCOL
NATURE PROTOCOLS
2354
NATURE PROTOCOLS
| VOL 16 | MAY 2021 | 2345
–
2380 |
www.nature.com/nprot
●
BD Difco yeast nitrogen base without amino acids (Thermo Fisher, cat. no. 11753573)
●
L-Tyrosine (Sigma-Aldrich, cat. no. T3754)
●
Nextera XT DNA Library Preparation Kit (Illumina, cat. no. FC-131-1204)
●
MiSeq Reagent Kit v3, 150-cycle or 600-cycle (Illumina, cat. no. MS-102-3001 or MS-102-3003)
Equipment
●
Qiagen QIAquick PCR Puri
fi
cation Kit (cat. no. 28104)
●
Qiagen QIAprep Spin Miniprep Kit (cat. no. 27106)
●
Qiagen MinElute Gel Extraction Kit (cat. no. 28604)
●
Qiagen DNeasy Blood & Tissue Kit (cat. no. 69504)
●
Qiagen Gentra Puregene Yeast/Bact. Kit (cat. no. 158567)
●
Zymo Research DNA Clean & Concentrator Kit (cat. no. D4029)
●
Mupid-One Advance Gel Electrophoresis System (Mupid, VWR cat. no. 102407-972)
●
Metal blades (Azpack, cat. no. 11904325)
●
Thermocycler (Biometra, cat. no. 846-x-070-720)
●
Eppendorf F1.5 ThermoMixer (Eppendorf, cat. no. 5384000039)
●
Eppendorf Eporator (electroporator) (Eppendorf, cat. no. 4309000035)
●
Electroporation cuvettes with 2 mm gap (Flowgen, cat. no. FBR-202)
●
1.5 mL microcentrifuge tubes (Sarstedt, cat. no. 72.690.001)
●
Centrifuge (Eppendorf, cat. no. 5811000827)
●
Benchtop microcentrifuge (Eppendorf, cat. no. 5424)
●
25 cm×25 cm plates (Nunc Sterile Bioassay Dishes) (Sigma-Aldrich, cat. no. D4803)
●
Petri dishes (Sarstedt, cat. no. 82.1473.001)
●
50 mL Falcon Tubes (Sarstedt, cat. no. 62.547.254)
●
250 mL PP Centrifuge Tubes (Corning; cat. no. 430776)
●
2L glass conical
fl
ask (Fisher Scienti
fi
c, cat. no. 15459103)
●
NanoDrop Spectrophotometer (ThermoFisher, cat. no. ND-2000)
●
PCR tubes (Axygen, cat. no. PCR-0208-C)
●
Shaking incubator (ThermoFisher, cat. no. SHKE6000)
●
Sanyo MIR-154 incubator (ThermoFisher, cat. no. 12856746)
●
Spectrophotometer for cell cultures (Eppendorf, cat. no. 6133000028)
●
Cuvettes for spectrophotometer (BrandTech Scienti
fi
c, cat. no. 759086D)
●
Molecular Imager Gel Doc XR
+
System with Image Lab Software (Bio-Rad, cat. no. 1708195)
●
MiSeq System (Illumina, cat. no. SY-410-1003)
Software
●
SnapGene (GraphPad Software,
https://www.snapgene.com
)
●
Integrative Genomics Viewer (IGV,
https://igv.org/
)
●
iSeq package (
https://github.com/TiongSun/iSeq
)
Post-REXER
clones
2
nd
REXER
NGS
Re-introducing
BAC with
corrected
syn. seq.
locus
0
locus
1
Compiled recoding landscape of
corrected syn. sequence
100%
0
20,000
0
40,000
60,000
80,000
100,000
Locus
0
Locus
1
Genomic positions between
locus
0
&
locus
1
% of population
with syn.seq.
100%
0
20,000
0
80,000
60,000
100,000
Locus
0
Locus
1
Position of deleterious
sequence
Original syn. seq.
design
0% syn. seq.
Compiled recoding landscape
with original syn. sequence
Corrected synthetic sequence
Corrected syn. seq.
design
Fig. 6 |
Testing potential
fi
xes for deleterious synthetic sequences.
Testing
fi
xes to deleterious sequences in synthetic DNA. The recoding landscape
is used to identify a deleterious sequence (Fig.
5b
). A potential
fi
x for the deleterious sequence is designed (orange bar). A new BAC containing the
corrected synthetic sequence is assembled, introduced into the recipient cell, and REXER performed. Multiple post-REXER clones are sequenced and
the compiled recoding landscapes are used to determine whether the altered synthetic DNA is functional. The initial design
fl
aw is
fi
xed if the new
synthetic DNA can replace the original genomic DNA.
NATURE PROTOCOLS
PROTOCOL
NATURE PROTOCOLS
| VOL 16 | MAY 2021 | 2345
–
2380 |
www.nature.com/nprot
2355