6726–6739
Nucleic Acids Research, 2020, Vol. 48, No. 12
Published online 25 May 2020
doi: 10.1093/nar/gkaa418
Sequence-dependent dynamics of synthetic and
endogenous RSSs in V(D)J recombination
Soichi Hirokawa
1
, Griffin Chure
2
, Nathan M. Belliveau
2
, Geoffrey A. Lovely
3
,
Michael Anaya
2
, David G. Schatz
4,*
, David Baltimore
2
and Rob Phillips
2,5,*
1
Department of Applied Physics, California Institute of Technology, Pasadena, CA 91125, USA,
2
Division of Biology
and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA,
3
National Institute on
Aging, National Institutes of Health, Baltimore, MD 21224, USA,
4
Department of Immunobiology, Yale University
School of Medicine, New Haven, CT 06520, USA and
5
Department of Physics, California Institute of Technology,
Pasadena, CA 91125, USA
Received January 31, 2020; Revised April 20, 2020; Editorial Decision May 05, 2020; Accepted May 07, 2020
ABSTRACT
Developing lymphocytes of jawed vertebrates cleave
and combine distinct gene segments to assemble
antigen–receptor genes. This process called V(D)J
recombination that involves the RAG recombinase
binding and cutting recombination signal sequences
(RSSs) composed of conserved heptamer and non-
amer sequences flanking less well-conserved 12-
or 23-bp spacers. Little quantitative information is
known about the contributions of individual RSS
positions over the course of the RAG–RSS interac-
tion. We employ a single-molecule method known as
tethered particle motion to track the formation, life-
time and cleavage of individual RAG–12RSS–23RSS
paired complexes (PCs) for numerous synthetic
and endogenous 12RSSs. We reveal that single-bp
changes, including in the 12RSS spacer, can signifi-
cantly and selectively alter PC formation or the prob-
ability of RAG-mediated cleavage in the PC. We find
that some rarely used endogenous gene segments
can be mapped directly to poor RAG binding on their
adjacent 12RSSs. Finally, we find that while abro-
gating RSS nicking with Ca
2+
leads to substantially
shorter PC lifetimes, analysis of the complete life-
time distributions of any 12RSS even on this reduced
system reveals that the process of exiting the PC in-
volves unidentified molecular details whose involve-
ment in RAG–RSS dynamics are crucial to quantita-
tively capture kinetics in V(D)J recombination.
INTRODUCTION
Jawed vertebrates call upon developing lymphocytes to un-
dergo a genomic cut-and-paste process known as V(D)J re-
combination, where disparate gene segments that do not in-
dividually code for an antigen–receptor protein are system-
atically combined to assemble a complete, antigen receptor-
encoding gene (
1
). V(D)J recombination supports the pro-
duction of a vast repertoire of antibodies and T-cell recep-
tors that protect the host organism from a broad array of
pathogens. However, gene segment combinations are not
made in equal proportions; some gene segment combina-
tions are produced more frequently than others (
2–5
). Al-
though V(D)J recombination requires careful orchestration
of many enzymatic and regulatory processes to ensure func-
tional antigen–receptor genes whose products do not harm
the host, we strip away these factors and focus on the initial
stages of V(D)J recombination. Specifically, we investigate
how the dynamics between the enzyme that carries out the
cutting process and its corresponding DNA-binding sites
adjacent to the gene segments influence the initial stages
of recombination for an array of synthetic and endogenous
binding site sequences.
The process of V(D)J recombination (schematized in
Figure
1
) is initiated with the interaction between the
recombination-activating gene (RAG) protein complex and
two short sequences of DNA neighboring the gene seg-
ments, one that is 28 bp and another that is 39 bp in
length. These recombination signal sequences (RSSs) are
composed of a well-conserved heptamer region immediately
adjacent to the gene segment, a more variable 12- (for the
12RSS) or 23-bp (for the 23RSS) spacer sequence and a
well-conserved nonamer region. For gene rearrangement to
begin, RAG must bind to both the 12- and the 23RSS to
form the paired complex (PC) state (Figure
1
B). Through-
out the binding interaction between RAG and either RSS,
*
To whom correspondence should be addressed. Tel: +1 626 395 6337; Email: phillips@pboc.caltech.edu
Correspondence may also be addressed to David G. Schatz. Tel: +1 203 737 2255; Email: david.schatz@yale.edu
Present address: Nathan M. Belliveau, Howard Hughes Medical Institute and Department of Biology, University of Washington, Seattle, WA 98195, USA.
C
The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http:
//
creativecommons.org
/
licenses
/
by
/
4.0
/
), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Downloaded from https://academic.oup.com/nar/article-abstract/48/12/6726/5843817 by California Institute of Technology user on 07 July 2020
Nucleic Acids Research, 2020, Vol. 48, No. 12
6727
Figure 1.
Schematic focusing on the initial steps of V(D)J recombination.
(
A
) The RAG complex composed of RAG1 (purple) and RAG2 (green)
binds to the 12- and 23RSSs (dark purple and orange triangles, respec-
tively) neighboring gene segments (shown as red and yellow boxes on the
DNA), (
B
) forming the paired complex (PC). At any point when it is bound
to an RSS, RAG can introduce a nick in the DNA between the heptamer
and gene segment (shown with the magnified 12RSS) and must do so to
both sites before (
C
) it cleaves the DNA to expose the gene segments. As
indicated by the magnified gene segment end, the exposed DNA strands of
the gene segment are connected to form a DNA hairpin. (
D
) Additional
proteins join these segments together. In this work, the stages subsequent
to DNA cleavage are not monitored.
RAG has an opportunity to nick the DNA (enlargement in
Figure
1
B) (
6
). RAG must nick both RSSs before it cleaves
the DNA adjacent to the heptamers to expose the gene seg-
ments and to create DNA hairpin ends (Figure
1
C). DNA
repair proteins complete the reaction by joining the gene
segments to each other and the RSSs to one another (Fig-
ure
1
D).
RSS sequence-conservation studies across many organ-
isms have shown a vast diversity of 12- and 23RSS se-
quences, mainly found through heterogeneity in the spacer
region (
7
). Bulk assays reveal that changing an RSS se-
quence can significantly influence the RAG–RSS interac-
tion and ultimately the success rate of completing recom-
bination (
8–12
). Recent structural results provide evidence
that RAG binding is sensitive to both base-specific contacts
and the local flexibility or rigidity of the 12- and 23RSS
(
13–15
). Despite this extensive characterization on the in-
teraction, little is known about how a given RSS sequence
affects each step of the RAG–RSS reaction. In this work,
we provide one of the most comprehensive studies of how
RSS sequences govern the initial steps of V(D)J recombina-
tion and provide a quantitative measure of their effects on
the formation frequency, lifetime and cleavage probability
of the PC.
We employ a single-molecule technique known as teth-
ered particle motion (TPM) in which an engineered strand
of DNA containing a 12RSS and 23RSS is attached to a
glass coverslip at one end and to a polystyrene bead at the
other (Figure
2
A). Using brightfield microscopy, we collect
the root-mean-squared displacement (RMSD) of the bead
over time to identify the state of the RAG–RSS interaction.
As illustrated in Figure
2
B, when RAG forms the PC with
the RSSs, the shortened DNA tether constrains the motion
Figure 2.
Sample data output of TPM. By tracking the root-mean square
displacement (RMSD) of the tethered bead position undergoing restrained
Brownian motion, we discern when the DNA tether is (
A
) in the unlooped
state, (
B
) in the PC (looped) state and (
C
) cleaved. The dashed horizontal
lines distinguish the unlooped (red) and looped (green) states of the DNA,
and are drawn before examining the bead trajectories. The RMSD values
of these lines are based on the length of the DNA tether; the distance be-
tween the RSSs along the strand; the extent to which HMGB1, a protein
that binds nonspecifically to DNA and helps facilitate RAG binding, kinks
the DNA; and a set of calibration experiments relating the range of mo-
tion of the bead to the length of its tether. As depicted with the magnified
DNA strand in (A), the 12RSS and 23RSS are positioned 1200 bp away
throughout the study.
of the bead, reducing the RMSD. When RAG cleaves the
PC, the bead is released and diffuses away from the tether
site (Figure
2
C). TPM has been applied to track the dy-
namic behavior of various protein–DNA systems, includ-
ing RAG and RSS (
16–21
). It is with the temporal resolu-
tion provided by TPM that we can track the full progression
of individual RAG–RSS interactions from PC formation to
cleavage.
We were interested in using TPM to determine the extent
to which endogenous RSSs dictate the usage frequency of
their neighboring gene segments and, for those RSS posi-
tions that do seem to influence gene segment usage, identify
the steps in the RAG–RSS reaction when the RSSs help or
hurt the selection of their gene segment by extracting kinetic
rates. We first examine single bp changes to a designated
reference 12RSS, thereby establishing a mechanistic under-
standing of the contribution of individual nucleotide posi-
tions to RAG–RSS dynamics. With the synthetic 12RSSs
providing context, we study a set of endogenous 12RSSs,
each of whose sequences can be directly related to the refer-
ence sequence and a subset of the characterized synthetic
12RSSs. This selection of 12RSSs was also chosen from
repertoires where the usage frequencies of their gene seg-
ments are known. Finally, due to the depth of insight of-
fered by waiting time distributions generated by the TPM
assay, in an attempt to provide some of the first measure-
ments of various RAG–RSS kinetics, we show through our
analysis of the PC lifetime distributions that regardless of
choice of 12RSS or divalent cation, our TPM data consis-
tently disagree with a single-rate model. We discuss the con-
sequences of our finding in the context of our understand-
ing of the molecular details of the RAG–RSS reaction. As
this study resulted in a wealth of data on a large number
Downloaded from https://academic.oup.com/nar/article-abstract/48/12/6726/5843817 by California Institute of Technology user on 07 July 2020
6728
Nucleic Acids Research, 2020, Vol. 48, No. 12
of RSS sequences, we have developed an interactive online
resource for visualizing the dataset in its entirety.
MATERIALS AND METHODS
Protein purification
The two RAG components, core RAG1 and core RAG2
(RAG1
/
2), are purified together as outlined in Ref. (
20
).
Maltose binding protein-tagged murine core RAG1
/
core
RAG2 were co-expressed by transfection in HEK293-6E
suspension cells in a 9:11 w
/
w ratio for 48 h before purify-
ing using amylose resin. HMGB1 is purified as outlined in
(
20
). His-tagged HMGB1 was expressed in isopropyl
-
D
-
1-thiogalactopyranoside-induced BL21 cells for 4 h at 30
◦
C
before purification. For more details, see the Supplementary
Data Text.
Flow cell assembly
TPM flow cells were assembled by drilling four holes along
each length of a glass slide before cleaning the slides and
coverslips. The slides and coverslips were functionalized
with an epoxidizing solution for at least an hour and a half
so that anti-digoxigenin, to which the digoxigenin ends of
the DNA tethers attach, could adhere to the glass. Upon
completion of the treatment, flow cells are assembled by
cutting four channels into double-sided tape to connect the
drilled holes at opposite ends of the glass slide before ad-
hering to the coverslip on one side and the glass slide on the
other. Short connective tubes are inserted into each of the
holes to serve as inputs and outputs for fluids and sealed us-
ing 5-min epoxidizing solution. The constructed flow cells
are baked on the hot plate to allow the epoxy and double-
sidedtapetoset.
Tethered bead assembly
Tethered beads are assembled as in Supplementary Figure
S1. Flow cell channels are incubated with anti-digoxigenin
for 2 h to allow for adhering DNA to the glass sur-
faces. After washing away excess anti-digoxigenin in a
buffer solution containing Tris-HCl, KCl, MgCl
2
,DTT,
EDTA, acetylated BSA and casein, engineered strands of
2900 bp-long DNA containing a 12RSS and a 23RSS lo-
cated 1200 bp apart and tagged with digoxigenin on one
end and biotin at the other end are injected into the flow
cells to attach the digoxigenin end of the DNA to the anti-
digoxigenin-scattered surfaces. After excess DNA is washed
out, streptavidin-coated polystyrene beads 490 nm in diam-
eter are added to the channels and incubated for
≈
3 min to
bind the biotin-labeled end of the DNA. Excess beads are
washed away and the TPM assembly buffer is replaced with
a RAG reaction buffer containing Tris-HCl, KCl, glyer-
col, DTT, potassium acetate, MgCl
2
, DMSO and acetylated
BSA. For Ca
2+
studies, CaCl
2
is used in place of MgCl
2
in
the RAG reaction buffer and in the same concentration. See
Supplementary Data Text for a schematic of the TPM as-
sembly process.
TPM experiment
TPM experiments involve the simultaneous acquisition of
bead trajectories from two different channels on separate
microscopes. One of the channels contains tethered DNA
with a 12RSS and a 23RSS oriented toward each other
(nonamer regions on both RSSs closest to each other).
Properly tethered beads are filtered using various methods
to ensure proper spacing from neighboring beads and that
individual beads are tethered by a single strand of DNA.
The trajectories of the selected beads are then examined in
the absence of RAG and HMGB1 for 10 min before flow-
ing in 9.6 nM murine core RAG1
/
core RAG2 and 80 nM
full-length HMGB1 and acquiring bead trajectories for at
least 1 h. Root-mean-squared displacements (RMSDs) of
the bead trajectories as shown in Figure
2
are calculated by
Gaussian filtering with an 8-s standard deviation. Bead se-
lection criteria, corrections and smoothing of trajectories,
and identification of PCs are provided in the Supplemen-
tary Data Text. Example dataset of all analyzed bead tra-
jectories from one replicate is presented in Supplementary
Figure S2.
Statistical inference
We used Bayesian and Frequentist methods in this work to
calculate parametric and nonparametric quantities, respec-
tively. The PC formation frequencies were assigned con-
fidence intervals via bootstrapping. Briefly, the observed
beads and their reported PC formation counts were sam-
pled with replacement to generate a simulated data set of the
same length as the number of observations. The looping fre-
quency was then calculated as the total loops formed among
the generated dataset divided by the number of beads and
the distribution was resampled again. This procedure was
performed 10
6
times and we report various percentiles of
these bootstrap replicates, as shown both in the main text
and on the paper website. A more detailed explanation is
provided in the Supplementary Data Text.
To compute the cleavage probability and PC leaving rate
k
leave
, we used a Bayesian definition of probability and con-
structed a posterior distribution for each as is explicitly laid
out in the Supplementary Data Text. The displayed pos-
terior distributions for the cleavage probability were gen-
erated by numerically evaluating the posterior distribution
over a range of cleavage probabilities bounded from 0 to
1. The reported values for the cleavage probability and un-
certainty were computed analytically and is derived in the
Supplementary Data Text.
To estimate
k
leave
we again constructed a posterior dis-
tribution. Here, we chose an exponential form for the like-
lihood and assumed an inverse Gamma distribution as a
prior on the leaving rate. This posterior was then sampled
using Markov chain Monte Carlo as is implemented in the
Stan probabilistic programming language (
22
). A more de-
tailed derivation of the posterior distribution is provided in
the Supplementary Data Text. All models and code for this
inference are available on the paper website.
Significance testing was performed for the looping fre-
quency, median PC lifetime, and fraction of cutting events.
Our null hypothesis for each metric was that the measured
Downloaded from https://academic.oup.com/nar/article-abstract/48/12/6726/5843817 by California Institute of Technology user on 07 July 2020
Nucleic Acids Research, 2020, Vol. 48, No. 12
6729
value for the altered 12RSS was drawn from the same dis-
tribution as the V4-57-1 (reference) 12RSS with
p
-values
≤
0.05 determined to be statistically significant. All
p
-values
for each of these metrics and details about their calculation
are provided in the Supplementary Data Text.
Interactive figures
All results presented in this manuscript are visually com-
plemented with interactive figures on the paper website at
https://www.rpgroup.caltech.edu/vdj
recombination/
.The
Cutting Probability Model Explorer shows how the pos-
terior distribution for the cutting probability changes de-
pending upon the number of loops and number of cuts ob-
served, both of which can be adjusted with their respective
scroll bars. The Synthetic RSS Explorer page displays data
for synthetic RSSs. Clicking on individual cells in the paired
complex formation frequency, paired complex dwell time or
paired complex cleavage probability heatmaps reveals plots
of the looping frequencies with different confidence interval
percentages from 10
6
bootstrap replicates; empirical cumu-
lative distribution functions (ECDFs) of the PC lifetimes
that revert to an unlooped configuration, are cut, or a com-
bination of the two fates; and full posterior distributions
of the probability of cutting for the synthetic RSS in blue
and the reference RSS in gray. Number of beads, loops and
cuts observed for the synthetic RSS are displayed by hov-
ering over the cells of the heatmaps. The Endogenous RSS
Explorer page displays these same plots but allows for com-
parison between any two endogenous RSSs studied through
dropdown menus, with the data for one RSS displayed in
gray, including observation counts, and those for the other
RSS shown in blue. The Synthetic-Endogenous RSS Com-
parison tool provides a means for selecting a particular en-
dogenous RSS by a dropdown menu and directly compar-
ing data for the endogenous RSS (gray) and the individ-
ual synthetic RSS that constitutes the sequence difference
between the endogenous RSS and the V4-57-1 (reference)
RSS, as revealed in the endogenous sequence with high-
lighted letters where the endogenous and reference RSSs
differ.
RESULTS
Synthetic RSSs
We chose a 12RSS flanking the immunoglobulin
variable
(
Ig
κ
V) gene segment, V4-57-1, as the reference sequence
due to its use in a previous TPM study on RAG–RSS inter-
actions (
20
). This sequence has also been used in structural
studies of RAG–RSS complexes (
13
,
15
), allowing us to
compare our results with known information on the RAG–
RSS structure. To explore how RAG–RSS interactions are
affected by single bp changes, we examined 40 synthetic
RSSs consisting of single bp changes across 21 positions
of the V4-57-1 12RSS, with a particular focus on altering
the 12 bp spacer that is the least well-understood element
in the RSS. We also studied changes made to positions 3–7
of the heptamer and various positions of the nonamer. The
first three positions of the heptamer are perfectly conserved
(
7
) likely to support DNA distortions needed for both nick-
ing and base-specific interactions with the cleavage domain
on RAG1 after nicking (
13–15
), while heptamer positions
4–7 also mediate base-specific interactions with RAG (
13
).
The nonamer is bound by a nonamer-specific binding do-
main on RAG1 (
13
,
23
). Throughout our synthetic and en-
dogenous RSS study, we used the same concentration of
the two RAG components (RAG1 and RAG2) that were
co-expressed and co-purified; and the same concentration
of the high mobility group box 1 (HMGB1) protein, which
binds nonspecifically to DNA and helps facilitate RAG
binding to the RSSs (
12
). We also fixed the distance be-
tween the two binding sites to be 1200 bp, thereby constrain-
ing our study to the influence of binding site sequence on
RAG–RSS dynamics alone. In addition, all of the 12RSSs
in this study are partnered with a well-characterized 23RSS
(
13
,
15
,
20
) adjacent to the frequently used J
1 gene segment
from the mouse
Ig
locus on chromosome 6 (
5
). The se-
quence of this RSS is provided in Supplementary Table S1.
All primer sequences and bead, loop and cut counts for each
synthetic 12RSS are provided in Supplementary Table S2.
We pooled the relevant data across experimental repli-
cates to characterize synthetic RSSs by three empirical
properties, namely the frequency of entering the PC (loop-
ing frequency), the quartiles of the PC lifetime (dwell time)
distribution and the probability of exiting the PC through
DNA cleavage (cutting probability). We define the looping
frequency as the ratio of distinct PCs observed to the to-
tal number of beads monitored over the course of the ex-
periment. Because a single DNA tether can loop and un-
loop multiple times over the course of the experiment, the
looping frequency can in principle range from 0 to
∞
.The
measured looping frequency and the 95% confidence inter-
vals from bootstrapping the looping frequency (as demon-
strated in Supplementary Figure S3) are shown for all of the
synthetic RSSs in Figure
3
.
As demonstrated in Figure
4
A, the dwell times were ob-
tained from measuring the lifetimes of each PC state, irre-
spective of whether the PC was cleaved or reverted to an
unlooped state. For each synthetic RSS, all of the PC life-
times are pooled to generate a histogram of dwell time dis-
tributions such as that in Figure
4
B, from which the mean,
shown as a white circle with an N for nucleotide, and the
first and third quartiles, shown as the furthest extents of the
blue error bar, are used to compare the synthetic RSSs in
Figure
4
C.
Finally, to compute the cutting probability, we considered
the fate of each PC as a Bernoulli trial with cleavage prob-
ability
p
cut
. This treatment allows us to construct the full
probability distribution of
p
cut
defined explicitly in Figure
5
A and fully detailed in the ‘Materials and Methods’ sec-
tion and Supplementary Data Text for a PC containing the
RSS of interest. The measured number of loops
n
loops
and
cuts
n
cuts
collected from experiments (in the case of Figure
5
A, 152 loops and 70 cuts) are parameters inserted into the
equation to yield a distribution such as in Figure
5
B. We
computed the most likely
p
cut
and one standard deviation,
as demonstrated in Figure
5
B by the white circle with the
N and blue error bars, respectively, for each synthetic RSS,
and compiled them in Figure
5
C. The Cutting Probabil-
ity Model Explorer interactive figure provides a visualiza-
tion for how the probability distribution is sensitive to the
empirically-collected number of loops and cuts. Detailed
Downloaded from https://academic.oup.com/nar/article-abstract/48/12/6726/5843817 by California Institute of Technology user on 07 July 2020
6730
Nucleic Acids Research, 2020, Vol. 48, No. 12
Figure 3.
Looping frequency for single bp changes introduced at various positions of the reference 12RSS. Loop frequency with 95% confidence interval
of the distribution of possible looping frequencies from 10
6
bootstrap replicates. The dotted black line is set at the reference loop frequency, 0.22, with
shaded area denoting the extent of the 95% confidence interval for the reference. Alternating vertical stripe colors and the reference sequence writt
en along
the
x
-axis demarcate the position where the change was made and the original nucleotide. The introduced nucleotide is provided in the figure with the
letter and color-coded (red for A, green for C, light blue for T and purple for G). Heptamer, spacer and nonamer regions are also separated by vertical
lines in the sequences. Asterisks at the top of certain positions are color-coded to specify the nucleotide whose resultant looping frequency differ
sfromthe
reference sequence with
p
-value
≤
0.05. All
p
-values for each 12RSS used are reported in Supplementary Figure S5.
discussions of the choice of metrics and the correspond-
ing error estimates are provided in ‘Materials and Meth-
ods’ section and Supplementary Data Text. We also show
in Supplementary Figure S4 and Supplementary Data Text
that our definitions of the looping frequency and cutting
probability decouple the PC forming and cleavage steps in
the RAG–RSS reaction, thereby clarifying which step is the
limiting factor in completing the cleavage phase of V(D)J re-
combination. We complement the condensed synthetic RSS
results presented here with an interactive figure that pro-
vides a more complete visualization of each RSS studied on
the website. The Synthetic RSS Explorer interactive figure
includes heatmaps to qualitatively illustrate how the syn-
thetic RSSs differ in the three defined metrics. By clicking on
a particular cell in any of the heatmaps, the interactive dis-
plays the measured looping frequency of the synthetic RSS
containing the corresponding bp change with several confi-
dence interval percentages from the bootstrapping. Hover-
ing over a cell also brings up a window showing the number
of beads, loops and cuts observed for the synthetic RSS. In
addition, the webpage shows empirical cumulative distribu-
tion functions (ECDFs) of PC lifetimes in three groups: PCs
that are cleaved, PCs that are unlooped and both together.
This webpage includes the complete posterior probability
distribution of
p
cut
for each synthetic RSS.
Figures
3
,
4
Cand
5
C illustrate the substantial effect that
a single bp change to an RSS can have on the formation, sta-
bility and cleavage of the PC, respectively, reaffirming that
RSS sequence plays a role in regulating the initial steps of
recombination. Of interest is the observed difference in phe-
nomena between changes made to the third position and
those made to the last four bases of the heptamer region.
Bulk assays have shown that deviating from the consensus
C at heptamer position 3 essentially eliminates recombina-
tion (
8
,
10
), yet we found that changing from the C to G or T
did not inhibit PC formation (Figure
4
C). In fact, these al-
terations show similar looping frequencies and PC lifetimes
(Figure
4
C) as found for the reference sequence. Instead,
both the C-to-G and C-to-T alterations to heptamer po-
sition 3 almost completely suppress cleavage (Figure
5
C).
We provide the full probability distribution for the estimate
of
p
cut
for these two RSSs in Figure
5
D. Nearly all of the
probability density is concentrated below 10%, showing that
cutting the PC is exceedingly rare. Thus, although deviating
from a C at heptamer position 3 does not prevent RAG from
forming the PC, the alteration impedes DNA cleavage.
Among the changes made to the last four bases of the
heptamer from the reference sequence, the fifth and sixth
positions showed the most striking reductions on PC for-
mation (Figure
3
). Of
>
240 DNA tethers with the 12RSS
containing a T-to-A change at heptamer position 6, only
two PCs formed, one of which subsequently led to cleav-
age. This result is consistent with recent findings that the
consensus TG dinucleotide at the last two positions of the
heptamer supports a kink in the DNA and may be critical
for RAG binding (
14
). We notice that some changes such
as the one at heptamer position 4 (A to T) increase the me-
dian time spent in the PC (Figure
4
C).ThisRSSalsohad
one of the widest dwell time distributions of all of the syn-
thetic RSSs studied. While some alterations to the last four
heptamer positions yielded little change in cleavage propen-
sity compared to the reference, others showed a reduction
in
p
cut
. The single bp change that had the greatest effect, lo-
cated at heptamer position 6 (T to C) showed that only 2
out of 24 PCs led to cleavage.
Although we observed only modest differences in the me-
dian dwell times when we altered the reference sequence in
the spacer region, some alterations substantially affected
the looping frequency and cutting probability. The C-to-T
change at spacer position 4 doubled the frequency of ob-
serving the PC while a T-to-G change at the ninth position
reduced PC formation nearly as much as changes made at
heptamer position 6 (Figure
3
). These two changes made in
the spacer reflect the observed extremes of spacer sequence
effects on the looping frequency. While many of the changes
in the spacer region do not alter the cutting probability, we
Downloaded from https://academic.oup.com/nar/article-abstract/48/12/6726/5843817 by California Institute of Technology user on 07 July 2020
Nucleic Acids Research, 2020, Vol. 48, No. 12
6731
Figure 4.
Dwell time quartiles for single bp changes introduced at various positions of the reference 12RSS. (
A
) Example bead trajectory data (blue) and
the dwell times of the two loops that are formed (brackets). As in Figure
2
, the red dashed line corresponds to the unlooped DNA tether state while the
green dashed line denotes the predicted looped state. (
B
) Histogram of all dwell times collected for a given RSS. Note that all loops involving the RSS of
interest are included in the histogram, regardless of whether the loop precedes cutting or a return to the unbound state. The median is shown as the circ
le
containing N (for nucleotide) with lines extending to the first and third quartiles. The method for obtaining the circle and error bars as shown in (
B
)are
then applied to each synthetic 12RSS dataset and presented in (
C
) where the letters denote the replacement nucleotide. The dotted black line in (C) denotes
the reference 12RSS median dwell time, 2.1 min, with the black bar at the left denoting the first and third quartiles of the distribution. Vertical strip
es;
x
-axis labeling; heptamer, spacer and nonamer distinction; and color-coding of nucleotide changes (red for A, green for C, light blue for T and purple f
or
G) are the same as in Figure
3
. Asterisks at the top of certain positions are color-coded to specify the nucleotide whose resultant dwell time differs from
the reference sequence with
p
-value
≤
0.05. All
p
-values for each 12RSS used are reported in Supplementary Figure S5.
can still find spacer-altered RSSs that improve or inhibit
cleavage. Figure
5
D shows that changing the fourth posi-
tion from C to G shifts the probability distribution of
p
cut
to
lower values, while altering the tenth position of the spacer
from G to T shifts the distribution toward an increased
cleavage probability. RAG1 makes contacts along the en-
tire length of the 12RSS spacer (
14
), helping to explain our
finding that changes to the spacer can substantially alter
the probability of PC formation and cutting, thereby play-
ing more of a role than simply separating the heptamer and
nonamer sequences.
Similar to spacer changes, most nonamer changes show
strongly overlapping dwell time distributions, with median
PC dwell times differing from the reference sequence by
<
1
min (Figure
4
C). However, unlike spacer-modified RSSs,
most nonamer-altered RSSs reduced the frequency of PC
formation. Disruptions to the poly-A sequence in the cen-
ter of the nonamer cause a substantial reduction in loop-
ing frequency, most notably the near complete inhibition of
PC formation with the A-to-C change at nonamer position
3(Figure
3
). This detrimental effect of deviating from the
poly-A tract agrees with previous work demonstrating nu-
merous protein–DNA interactions in this region and with
the proposal that the rigidity produced from the string of A
nucleotides is a critical feature for RAG1 to bind the non-
amer (
14
,
23
). Furthermore, this reduction in looping fre-
quency can extend to changes made toward the end of the
nonamer, depending upon the nucleotide, as shown with
the significant reduction for the C-to-T mutation at non-
amer position 8 (Figure
3
). The sequence deviations in the
nonamer region, however, do not significantly affect cleav-
age once the PC has formed, as evidenced by the overlap
in the posterior distributions of the reference sequence and
its nonamer variant showing the greatest reduction in cleav-
age probability (position 4, A to C), in Figure
5
D. Overall,
nonamer deviations from the reference RSS have negative
effects on PC formation with minimal effects on subsequent
DNA cleavage, consistent with extensive biochemical and
structural evidence that the primary function of the non-
amer is to facilitate RAG–DNA binding (
23
).
Endogenous RSSs
To build on our study of single bp effects on RAG–RSS
dynamics, we selected a subset of endogenous RSSs from
the mouse V
locus on chromosome 6 based on existing
gene usage frequency data collected by Aoki-Ota
et al.
(
5
)
and because the sequence differences between these RSSs
and the reference RSS are individually examined in the syn-
thetic RSS results. We studied a variety of frequently used
(
>
5% frequency of usage) gene segments (V1-135, V9-120,
V10-96, V19-93, V6-15 and V6-17), two moderately used
(
>
1% and
<
3% frequency) gene segments (V4-55 and V5-
43) and two rarely used (
<
0.5% frequency) gene segments
Downloaded from https://academic.oup.com/nar/article-abstract/48/12/6726/5843817 by California Institute of Technology user on 07 July 2020
6732
Nucleic Acids Research, 2020, Vol. 48, No. 12
Figure 5.
Cutting probabilities for single bp changes introduced at various positions of the reference 12RSS. (
A
) For a given RSS, the total number of
distinct loops in the assay
n
loops
(in this case, 152 loops) and the subset of those loops that RAG cleaves
n
cuts
(70) are applied to the function shown that
identifies the full distribution of the cutting probability
p
cut
of the PC for a 12RSS of interest. (
B
) Example distribution for a particular RSS, with the most
likely cutting probability
μ
with N for ’nucleotide’ and standard deviation
σ
shown as a circle with blue error bars, respectively. (
C
)
μ
and
σ
are shown
for each synthetic RSS with the dotted black line denoting the most probable
p
cut
for the reference sequence, roughly 0.46, with the gray shaded region
setting one standard deviation. Vertical stripes;
x
-axis labeling; heptamer, spacer and nonamer distinction; and color-coding of nucleotide changes (red for
A, green for C, light blue for T and purple for G) are the same as in Figure
3
.(
D
) Ridgeline plot of posterior distributions of the cutting probability, given
the number of loops observed and loops that cut (see Supplementary Data) for a subset of the synthetic RSSs (labeled and colored along the zero-line of
the respective ridgeline plot). Height of the distribution to the horizontal line of the same color corresponds to the posterior distribution. See th
e Cutting
Probability Model Explorer interactive webpage to see how the posterior distribution depends on the number of loops and cuts observed. Asterisks at t
he
top of certain positions are color-coded to specify the nucleotide whose resultant cutting probability differs from the reference sequence with
p
-value
≤
0.05. All
p
-values for each 12RSS used are reported in Supplementary Figure S5.
(V4-57-1 and V8-18) (
5
). We note that the V4-57-1 12RSS
is identical to the reference 12RSS in the synthetic study.
Furthermore, we use the same J
1 23RSS in the endoge-
nous RSS study as in the synthetic study. In addition, we
examined DFL16.1, the most frequently used D gene seg-
ment from the murine immunoglobulin heavy chain (
Igh
)
locus on chromosome 12 (
4
,
24
). Unlike the V
gene seg-
ments, which only need to combine with one gene segment,
D gene segments must combine with two other gene seg-
ments to encode a complete protein. As a result, DFL16.1
is flanked on both its 5
and 3
sides by distinct 12RSS se-
quences, denoted DFL16.1-5
and DFL16.1-3
, respectively,
both of which are examined in this study. The sequences of
all endogenous RSSs studied here as well as the number of
beads, loops and cuts observed are provided in Supplemen-
tary Tables S1 and S3. We apply TPM on these sequences to
determine whether their involvement in the RAG–RSS re-
action could both provide insight into the usage frequency
of their flanking gene segments and be predicted based on
the activity profile of the synthetic RSSs.
To develop a better sense for how RAG interacts with
these RSSs in their endogenous context, the 6 bp coding
flank sequence adjacent to the heptamer of all but the V4-
57-1 RSS was chosen to be the natural flank provided by
the endogenous gene segment. RAG interacts with the cod-
ing flank during DNA binding and PC formation (
13–15
)
and coding flank sequence can influence recombination ef-
ficiency, particularly the two bp immediately adjacent to the
Downloaded from https://academic.oup.com/nar/article-abstract/48/12/6726/5843817 by California Institute of Technology user on 07 July 2020
Nucleic Acids Research, 2020, Vol. 48, No. 12
6733
heptamer (
25–27
). Two T nucleotides and in many cases
even a single T immediately 5
of the heptamer inhibit the
nicking step of cleavage and thus reduce recombination effi-
ciency (
25–27
). We did not extensively analyze the contribu-
tion of coding flank sequence in this study, and only V6-15
RSS among the studied RSSs would be predicted to inter-
act poorly with RAG due to the T flanking the heptamer;
all other coding flanks have combinations of A and C as
the two terminal coding flank bases. We kept the same cod-
ing flank for the V4-57-1 RSS as in a previous study (
20
)to
facilitate closer comparison of the results of the synthetic
RSSs. We do not expect much difference between the en-
dogenous coding flank sequence (5
-CACT
CA, where the
two nucleotides closest to the heptamer are underlined) and
the coding flank used here (5
-GTCG
AC) because the two
terminal coding flank bases are similar to those of all but
the V6-15 RSS and for reasons discussed in the ‘Discus-
sion’ section and Supplementary Data Text. The coding
flank sequences for all studied endogenous RSSs are in-
cluded in Supplementary Table S1. We present the results
of the RAG-endogenous RSS interaction in Figure
6
and
provide an interactive tool for exploring these data on the
paper website. The Endogenous RSS Explorer includes an
interactive feature where the looping frequency, ECDFs of
looping lifetimes, and posterior distributions of the cleav-
age probability of any two endogenous RSSs can be directly
compared.
The variable nature of all three metrics [looping fre-
quency (Figure
6
A), dwell time (B) and cutting probability
(C; full posterior distributions for all endogenous 12RSSs
studied here are shown in Supplementary Figure S6) across
RSSs highlights how, similar to the synthetic RSSs, endoge-
nous sequences influence formation, stability and cleavage
of the PC differently. Of particular interest is the behavior
of DFL16.1-3
that shows the highest propensity for PC
formation but some of the shortest PC lifetimes. Despite
this short median dwell time, the probability of the PC suc-
cessfully proceeding to DNA cleavage is high,
≈
0.5. No-
tably, the frequency of PC formation and the probability of
cleavage are both greatly reduced for DFL16.1-5
as com-
pared to DFL16.1-3
, although their median PC dwell times
and the width of the dwell time distributions are approxi-
mately equal. Reduced function of DFL16.1-5
relative to
DFL16.1-3
is consistent with prior studies (
24
,
28
,
29
)and
is addressed further in the ‘Discussion’ section.
The endogenous RSSs of the V
gene segments show
varying efficiencies of PC formation and cleavage. Many of
the endogenous RSSs studied here, including those of gene
segments used frequently
in vivo
(V1-135, V9-120, V10-96,
V19-93, V6-17 and V6-15) demonstrate looping frequen-
cies between 15 and 30 events per 100 beads. Gene seg-
ments V4-57-1 and V4-55 are used with almost 0% and
roughly 2.5% frequency, respectively (
5
), yet in our experi-
ments, they enter the PC with comparable frequency (
∼
20–
30 loops per 100 beads). In general, we find these two se-
quences to behave almost identically in our experimental
system, illustrating that other biological phenomena, such
as higher order DNA structure, govern the segment usage
in vivo
(
4
,
30
). The endogenous V8-18 12RSS exhibits infre-
quent PC formation and cleavage and short median PC life-
times, much like the DFL16.1-5 12RSS. Using the V8-18
Figure 6.
Observed dynamics between RAG and endogenous RSS se-
quences. (
A
) Frequency of PC formation (looping frequency) with 95%
confidence interval. (
B
) Median PC lifetime with the lower error bar ex-
tending to the first quartile and the upper error bar extending to the third
quartile. (
C
) Probability of DNA cleavage (cutting probability) of RAG
with error bars showing one standard deviation. For discussion of the er-
rors in Figure
6
A and C, see the Supplementary Data Text. DFL16.1-3
and DFL16.1-5
flank the same gene segment but in different orientations
on the Igh chromosome. As shown in the graphic above Figure
6
A, V
gene
segments listed are ordered by their position along the chromosome, with
linear distance from the J
gene segments decreasing from left to right.
Numbers in parentheses next to V
gene segment denote percentage of us-
age in repertoire (
5
). The V4-57-1 12RSS has a filled in circle to denote that
it is the reference sequence in the synthetic RSS study. Asterisks at the top
of subfigures denote endogenous RSSs whose measured quantity differs
from the V4-57-1 (reference) 12RSS with
p
-value
≤
0.05. All
p
-values for
each 12RSS used are reported in Supplementary Figure S5.
12RSS, only 5 looping events were detected from 146 DNA
tethers and cleavage was never observed. Despite the simi-
larities in reaction parameters for the V8-18 and DFL16.1-
5
RSSs, DFL16.1 is the most frequently used D gene seg-
ment in the repertoire (
4
) while V8-18 is never used (
5
). A
likely explanation for the exclusion of V8-18 in the reper-
toire is the ‘A’ at heptamer position 6 of the 12RSS (see ‘Dis-
cussion’ section). In contrast, the DFL16.1 is substantially
utilized in the
Igh
repertoire despite the poor contribution
in PC formation and cleavage of its 5
12RSS most likely be-
cause this RSS does not participate in recombination until
Downloaded from https://academic.oup.com/nar/article-abstract/48/12/6726/5843817 by California Institute of Technology user on 07 July 2020
6734
Nucleic Acids Research, 2020, Vol. 48, No. 12
after its gene segment has undergone D-to-J recombination
with its more efficient 3
12RSS, thus moving the gene seg-
ment into the RAG-rich environment of the ‘recombination
center.’ This relocation is thought to facilitate RAG binding
to the 5
RSS of the committed D gene segment (
31
,
32
).
Figure
6
B demonstrates that, with the exception of the
V10-96 RSS, PC lifetimes are similarly distributed across
the endogenous RSSs examined in this work. Most RSSs
have median dwell times between 1 and 3 min with the V8-
18 12RSS displaying the shortest lived median dwell time
of roughly 40–50 s. While most endogenous RSSs here have
a similar range between the first and third quartiles (see
Endogenous RSS Explorer interactive figure on the paper
website), the V10-96 12RSS distribution is noticeably wider,
with the first quartile of the distribution being a longer life-
time than the median lifetime for most endogenous RSS dis-
tributions and the third quartile of this RSS extending out
to over 16 min. These observations suggest a similar stabil-
ity of the PC for all but the V10-96 RSS once RAG manages
to bind simultaneously to both 12- and 23RSSs.
Figure
6
C indicates that six endogenous RSS sequences
from V1-135 to V4-55 have comparable cutting probabil-
ities ranging from 0.4 to 0.5. Considering that the less-
frequently used V4-57-1 and V4-55 gene segments have
12RSSs that show similar cutting probabilities and loop-
ing frequencies to the 12RSSs of more frequently selected
gene segments, other factors appear to prevent their effi-
cient use. The low probability of cutting (0.05; Figure
6
C)
with the V6-15 12RSS is particularly noteworthy, indicating
that RAG tends to easily break the looped state rather than
commit to cleavage. However, this low cutting probability
might be attributed to the T in the coding flank immedi-
ately adjacent to the heptamer. Other features of the system
must dictate the high-frequency usage of V6-15
in vivo
(
5
).
Kinetic modeling of the PC lifetime distribution
Figures
4
Cand
6
B show that the vast majority of median
looping lifetimes ranged between 1 and 3 min with rare
exceptions, suggesting similar dwell time distributions for
many of the RSS variants. However, many of these syn-
thetic and endogenous RSSs have different probabilities of
DNA cleavage, suggesting that at the very least the rate of
cutting changes. These similarities in the lifetime distribu-
tions but differences in outcomes invited a thorough dis-
section of the data to extract key quantitative insights into
the changes in the kinetics between 12RSS constructs. As
TPM has been used to extract kinetic parameters for vari-
ous other protein–DNA systems (
17
,
18
,
33
,
34
), we used the
distributions of the PC lifetimes in an attempt to establish
the rates of unlooping and cutting for each RSS and discern
a deeper connection between RSS sequence and fate of the
PC. We developed a simple model in which a PC state can
have two possible fates: either simple unlooping of the DNA
tether or cleavage of the DNA by RAG. We characterized
each of these outcomes as independent yet competing pro-
cesses with rates
k
unloop
and
k
cut
for unlooping and DNA
cleavage, respectively. If the waiting time distribution
t
unloop
or
t
cut
for each process could be measured independently
where only one of the two outcomes was permitted to oc-
cur, one would expect the probability densities of these wait-
ing times given the appropriate rate to be single exponential
distributions of the form
P
(
t
unloop
|
k
unloop
)
=
k
unloop
e
−
k
unloop
t
unloop
(1)
for the unlooping process and
P
(
t
cut
|
k
cut
)
=
k
cut
e
−
k
cut
t
cut
(2)
for DNA cleavage. However, as these two Poisson processes
are competing, we cannot estimate
k
cut
solely from the wait-
ing time distribution of paired complex states that led to
DNA cleavage nor
k
unloop
using the states that simply un-
looped. As each individual cutting or unlooping event is as-
sumed to be independent of all other cutting and unlooping
events, the distribution of the dwell time
t
before the PC ei-
ther unloops or undergoes cleavage can be modeled as an
exponential distribution parameterized by the sum of the
two rates,
P
(
t
|
k
leave
)
=
k
leave
e
−
k
leave
t
,
(3)
where
k
leave
=
k
unloop
+
k
cut
.
Given the collection of waiting time distributions mea-
sured for each RSS, we estimated the values of
k
leave
that
best describe the data. We find that the observed dwell
times are not exponentially distributed for any 12RSS se-
quence analyzed, either endogenous or synthetic. Examples
of these waiting time distributions along with an exponen-
tial distribution parameterized by the 95% credible region
for
k
leave
can be seen for twelve of the RSS variants in Fig-
ure
7
. In general, the observed dwell times are underdis-
persed relative to a simple exponential distribution with an
overabundance of short-lived PCs. We also find that the ob-
served dwell time distributions are heavily tailed with ex-
ceptionally long dwell times occurring more frequently than
expected for an exponential distribution.
The ubiquity of this disagreement between the simplest
kinetic model and the observed data across all of the exam-
ined RSSs indicates that leaving the PC state either by re-
verting to the unlooped state or committing to the cleaved
state is not a one-step process, suggesting that at least one
of the two fates for the PC state on its own is not single-
exponentially distributed as assumed in our null model of
the dynamics.
One hypothesis for the disagreement between the model
given in Equation (
3
) and the data is that other processes,
such as nicking of the DNA by RAG, create effects in the
tethered bead trajectories that are too subtle to be detected
in the TPM assays. Nicking creates a more stable RAG–
single RSS complex (though this effect on PC stability had
not been previously quantified) (
13
,
35
) and can occur at any
time after RAG binds to the RSS (
6
), making it exceedingly
difficult to determine whether a given PC has one, both or
neither of the RSSs nicked. As a result, we may not be able
to model the combined kinetics of unlooping and cleav-
age without also identifying when RAG nicks the RSSs to
which it is bound.
Substitution of Ca
2+
in place of Mg
2+
in the reaction
buffer allows RAG to bind the RSSs but blocks both nick-
ing and cleavage (
36
), leaving unlooping as the only possible
fate of a PC. To determine if unlooping could be modeled
as a simple Poisson process, we measured the PC dwell time
Downloaded from https://academic.oup.com/nar/article-abstract/48/12/6726/5843817 by California Institute of Technology user on 07 July 2020